⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

Models / Meta

LLM

LLaMA-2

LLM trained on 2T tokens with double Llama 1's context length, available in 7B, 13B, and 70B parameter sizes.

This model is not available on Together’s Serverless API.

Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.

Related models