Mistral Small 3 API now available on Together AI: A new category leader in small models
Today we're launching Mistral Small 3 on Together AI as one of Mistral’s official launch partners. This 24B parameter open-source model delivers performance on par with GPT-4o mini, as well as much larger models like Llama 3.3 70B – making it a category leader in small models.
At Together AI, our mission is to bring the best open-source models to developers with better speed, lower costs and trusted reliability. Our platform is used by over 300,000 developers, and our OpenAI-compatible APIs make it seamless to test and integrate models like Mistral Small 3 into your existing applications.
TL;DR
- New Mistral Small 3 now available on Together AI—delivers GPT-4o mini level performance at just 24B parameters. Try it now.
- Great for general purpose conversational assistance, as well as robotics and other low-latency applications
- Integrate Mistral Small 3 into your existing applications using Together AI’s OpenAI-compatible APIs. Check out the quickstart in our docs.
- Available on Together Serverless with high-performance, pay-per-token endpoints, or via scalable dedicated GPU endpoints.
{{custom-cta-1}}
Smaller, better, faster, cheaper—and open
As AI models have grown more powerful, they've also become larger—increasing costs and latency. But not every use case requires a large model. Now the focus has become building smaller models that match the performance of larger ones, while running faster and more efficiently.
Mistral Small 3 exemplifies this shift by delivering GPT-4o mini level performance, in a significantly smaller package than Llama 70B. This enables developers to run high-quality AI with faster response times and lower costs. Open-source models like Mistral Small 3 are driving this innovation, leading organizations to move away from closed-source options like GPT-4o mini in favor of greater control, affordability, and customization.
Available under the Apache 2.0 license, Mistral Small 3 is ready for commercial use.
Use cases for smaller models
Smaller models like Mistral Small 3 excel in:
- Conversational AI – Chatbots, virtual assistants, and customer support automation.
- Summarization & Translation – Fast, accurate processing for text-heavy applications.
- Real-time AI – Robotics and other applications where low latency is critical.
Many companies optimize performance with a hybrid approach. A pattern we see often when building a customer support bot is routing 80% of general customer queries to a smaller model like Mistral Small 3, while routing the remaining 20% of more complex queries to larger reasoning models like DeepSeek V3 or R1.
Run Mistral Small 3 on Together AI
Together AI offers two deployment options for Mistral Small 3:
Together Serverless: fast, per-token-pricing
Start using Mistral Small 3 via the Together API with pay-per-token pricing and get blazing-fast performance on our optimized inference stack. Our OpenAI-compatible APIs allow for easy integration with your existing applications, and you can opt out of data sharing to maintain data privacy.
Try Mistral Small 3 on Together Serverless →
Together Dedicated Endpoints: High-performance, scalable
For production workloads or more consistent traffic, dedicated endpoints provide scalable hosting without rate limits. These endpoints include optimizations like adaptive speculators for improved inference speeds. Our platform is trusted by 300K+ developers and companies like Quora, Zoom, and DuckDuckGo.
Contact us to discuss dedicated endpoints or enterprise deployments for Mistral Small 3 →
Get started with Mistral Small 3
- Sign up for the Together Platform and try Mistral Small 3 in our playground.
- Get your API key and start sending requests.
- Check out our API quickstart to get started in minutes with our Python and TypeScript SDKs.
We can’t wait to see what you build!
- Lower
Cost20% - faster
training4x - network
compression117x
Q: Should I use the RedPajama-V2 Dataset out of the box?
RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.
article