Together AI Inc.
Platform for running and fine-tuning open source models.
Features Overview
Privacy & Data Policy
All Together AI Inc. Models
View All Providers →meta-llama/Llama-3-70b-chat-hf
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
deepseek-ai/DeepSeek-V3
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Qwen/Qwen2.5-7B-Instruct-Turbo
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
meta-llama/LlamaGuard-2-8b
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-3.2-3B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3-8B-Instruct-Lite
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-Guard-3-8B
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Llama-3.3-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
meta-llama/Meta-Llama-3-70B-Instruct-Turbo
A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.
deepseek-ai/DeepSeek-R1
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
Ready to use Together AI Inc. models?
Access all Together AI Inc. models through Requesty's unified API with intelligent routing, caching, and cost optimization.