Faster inference enables up to 5x price reduction on Together API
For the latest pricing please visit our pricing page.
At Together AI, we are optimizers. We are constantly working to create the most efficient AI stack on the market. Our research team is behind innovations that are core to today’s fastest optimizations, from batching techniques like FlexGen to algorithms like FlashAttention-2.
In the weeks since launching Together API — our cloud platform for building and running the world’s leading open-source AI models — we’ve continued to make strides to optimize our inference stack. And over the coming months, we’ll be releasing additional optimizations to speed up inference even more.
With faster performance, we can process a greater number of transactions per GPU, enabling better cost efficiency. Today, we’re excited to announce updated pricing to give you more for less.
Inference pricing
We’ve simplified pricing for inference across the 50+ open-source models available on our platform, including RedPajama, Llama 2, Falcon, and more.
For these out-of-the-box models, you only pay for requests (per 1K tokens used). You still launch your own inference VMs for the models you use — ensuring the privacy of your data.
For models that you fine-tune and then host on our platform, you pay the same amount for requests in addition to an hourly hosting fee when you launch your inference VM.
Chat, language, and code models
Your fine-tuned models
Image models
Pricing for image models remains the same.
Get started today!
Head to api.together.ai to start running more efficient inference with our Playgrounds and APIs! New users get $25 in free credits to get started. We’re excited to see what you build.
- Lower
Cost20% - faster
training4x - network
compression117x
Q: Should I use the RedPajama-V2 Dataset out of the box?
RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.
article