⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

Models / Meta

Chat

NIM Llama 3.1 Nemotron 70B Instruct

NVIDIA NIM for GPU accelerated Llama 3.1 Nemotron 70B Instruct inference through OpenAI compatible APIs.

This model is not available on Together’s Serverless API.

Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.

Related models