Company

Together AI partners with Meta to offer Llama 4: SOTA Multimodal MoE Models

April 5, 2025

・

Together AI

Today, we’re thrilled to introduce day 1 support for Llama 4 on the Together AI platform as a Meta launch partner. Built for developers seeking unparalleled multimodal capabilities, efficiency, and scale, Llama 4 combines a cutting-edge mixture-of-experts (MoE) architecture with native multimodality—empowering you to create state of the art long context AI applications.

We’re launching both groundbreaking Llama 4 models today:

Llama 4 Maverick (17B active params, 400B total): A 128-expert MoE powerhouse for multilingual image/text understanding (12 languages), creative writing, and enterprise-scale applications—outperforming Llama 3.3 70B. Supports 1M tokens context*.
Llama 4 Scout (17B active params, 109B total): A 16-expert MoE model that excels at multi-document analysis, codebase reasoning, and personalized tasks. A smaller model than Maverick but state of the art in its size & with text + image input support. Supports 10M context.*

*Note: We’re starting with 500k context length supported on Maverick and 300k context on Scout.

🦙 Meet the Llama 4 Herd

Llama 4 Maverick: The Multilingual Workhorse

1M-token context*: Process large amounts of data in one go, including entire code repositories, years of user activity, or vast research archive.
400B total parameters with 128 experts: Optimized for high-speed, high-quality responses in chat, creative writing, and precise image understanding.
12-language support: Break language barriers in global applications.
Use cases:
- 🌐 Multilingual customer support with visual context.
- 🎨 Generating marketing content from previous multimodal PDFs.
- 🔍 Advanced document intelligence (text + diagrams + tables).

Llama 4 Scout: Efficiency Meets Scale

10M-token context*: Process lots of data in one go, including entire textbooks.
109B total parameters with 16 experts, delivering state-of-the-art performance for its class.
Use cases:
- 📚 Multi-document summarization for legal/financial analysis.
- 🧑💻 Personalized task automation using years of user data.
- 🖼️ Efficient image parsing for multimodal applications.

*Note: For launch, Together AI currently supports a 500k context length for Maverick and 300k context length for Scout. We will be increasing it soon!

⚡ Example Multimodal Prompt

Get started with Llama 4 on our playground! You can play around with Llama 4 for free on Together Chat – our new consumer chat app.

🚤 Get started in minutes

Try Llama 4 Scout and Llama 4 Maverick on Together AI today through our serverless API. Join over half a million developers building on the Together platform on top of our optimized inference engine.


from together import Together

client = Together()
response = client.chat.completions.create(  
	model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",  
	messages=[{"role": "user", "content": "Summarize this codebase..."}],
	max_tokens=500
)
	
print(response.choices[0].message.content)

🌍 What Will You Build?

With long context, native multimodality, and MoE efficiency, Llama 4 unlocks building a new class of AI applications. And with the commitment that Together AI has to open-source innovation, developers can build with Llama 4 while fully controlling their models and data.

Try Llama 4 today on the Together Playground, Together Chat, or get started with building directly on our API. You can also deploy it on our dedicated endpoints to serve the most demanding production applications and enterprises. The herd has evolved—join the revolution.

Start Building with Llama 4 →

Contact Sales →

Lower
Cost
20%
faster
training
4x
network
compression
117x

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Links in this
article