Meta
Deploy Llama 4 Maverick and Scout on Together AI. Frontier multimodal performance, 10M token context, and 80%+ cost savings versus GPT-4o.
Why Meta on Together AI?
Designed for production workloads that need consistent performance and operational control.
Open source freedom, enterprise grade
Full model ownership — download the weights, deploy on Together AI’s cloud, or run on-premises. Your data never trains our models and never leaves your control.
Frontier multimodal performance
Llama 4 Maverick beats GPT-4o and Gemini 2.0 Flash on key benchmarks at just $0.27/1M tokens — an 80%+ cost reduction versus closed-source alternatives.
Built for scale, ready for enterprise
SOC 2 Type II certified, HIPAA compliant, with dedicated endpoints, monthly reserved capacity, and up to 40% savings at volume.
Meet the Meta family
Explore top-performing models across text, image, video, code, and voice.
Deployment options
Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.
Real-time
A fully managed inference API that automatically scales with request volume.
Best for
Batch
Process massive workloads of up to 30 billion tokens asynchronously, at up to 50% less cost.
Best for
Dedicated Model Inference
An inference endpoint backed by reserved, isolated compute resources and the Together AI inference engine.
Best for
Dedicated Container Inference
Run inference with your own engine and model on fully-managed, scalable infrastructure.
Best for