⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

Models / Meta

LLM

Llama 3.2 3B Instruct Turbo

Multimodal LLM optimized for visual recognition, image reasoning, captioning, and answering image-related questions.

This model is not available on Together’s Serverless API.

Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.

Related models