This website uses cookies to anonymously analyze website traffic using Google Analytics.

Together AI is the best end-to-end platform for developing your AI applications – no matter your starting point. Let’s build together.

What we offer

Together AI offers cutting-edge products to power AI for your application.

We have the fastest performance, effortless horizontal scalability, easy-to-use developer tools, and an expert team that’s excited to work closely with you.

We’ll make quick work of solving problems together and deploy at the scale of your enterprise.

  • unmatched performance

    Our research and innovations bring next-level efficiencies in training and inference that can scale with your needs. Together Inference Engine is the fastest inference stack available.

    Learn more
  • Built to scale with you

    We’ve built a horizontally scalable platform that is optimized to deliver the highest performance while scaling to meet your traffic.

  • designed for rapid integration

    And when you’re ready to bring your model into your apps, integration is snappy. With our easy-to-use API, your fine-tuned model can be seamlessly integrated into business processes in a matter of days.

    Read our docs
  • world-class support

    We understand what it takes to train AI models to meet business goals. Our team can help you prepare your datasets, optimize them for accuracy, train your own private AI model, and deploy it in a scalable way – all to drive measurable results for your business.

    Contact us
  • collaborate

    Share fine-tuned models across your team, collaborate on testing, analyze usage from team members, and set up API keys for each phase of your application development.

    Contact us




400 tokens/sec

relative to GPT-4o

11x LOWER cost

Together Inference: Fastest, most accurate, and cost efficient

  • Together Inference provides incredible speed — 4x faster than vLLM.

  • Together Inference is 11x lower cost than GPT-4o.

Cost and speed comparison with GPT 3.5 Turbo

We have clusters available for you

Reserve your cluster now

Customer Stories

Together is the partner of choice for the worlds most innovative AI developers. has built the new era of video games where the player is not limited to what the developer has pre imagined, but is dynamic. By leveraging the scalable and fast Together Inference service, addressed challenges in hosting large models, optimizing GPU deployments, and managing AI development costs.

  • 37x

    reduction in classifier costs

  • 2x

    size of free model

  • cost per token

  • Problem faced challenges hosting LLMs at scale due to high infrastructure costs, complex GPU deployment management, and the need for scalable AI solutions. As 95% of gameplay is driven by AI models, these hurdles affected their ability to drive gamer experiences to their high ambitions.

  • Solution

    Together Inference allowed to keep latency low while increasing total daily tokens by 8x. Receiving immediate access to the latest models such as Meta Llama 3 and Mixtral at low costs, was able to improve model quality and achieve longer context lengths. They also spent 80% less time managing GPU deployments leading to significant cost savings while being assured of their cluster’s health.

  • Result tripled their average input tokens per request, resulting in improved player value. In addition, their average requests per user per day have doubled, further establishing the player’s acknowledgement of these improvements. While using Together AI, Latitude has increased their user value and engagement as well as reinforced their core mission: giving back to the player.

"We have tripled our average input tokens per request which directly translates into increased player value, since more context achieves more coherence in AI responses. Our average requests per user per day have also doubled"

Nick Walton
CEO of

Cartesia is on a mission to build real-time intelligence for every user starting with their pioneering work on state space models (SSMs). They needed a low-latency inference solution and also wanted the flexibility to optimize for latency, throughput, and cost.

  • 135 ms

    model latency

  • 2x

    cost reduction

  • <2

    weeks onboarding new custom model

  • Problem

    Cartesia wanted a partner with a deep understanding of the inference stack for new model architectures. While latency was important to create seamless user experience, they also wanted the flexibility to optimize between latency, throughput and cost for their inference deployment

  • Solution

    The ability for Cartesia to use Together AI with their own custom model enabled them to serve their state-of-the-art custom state space model, Sonic, in less than 2 weeks. With Together AI, Cartesia was able to optimize for real-time inference with industry leading latency of <200 ms and 2x faster performance while maintaining the highest accuracy; all at half the cost of other providers.

  • Result

    Cartesia was able to achieve the industry’s fastest text-to-voice performance with <200 ms end-to-end latency and 135 ms model latency to provide real-time inference to their users. With their cutting-edge technology and the fast Together AI service, they achieved lower cost and real-time text-to-voice generations, passing on these benefits to their end users.

"The expertise Together AI has in optimizing model serving at scale helped us bring our model to production in record time. The ability to use Together AI with custom models is a huge unlock for companies developing their own models."

Karan Goel
CEO of Cartesia is building a third-party review system to evaluate AI model performance in different industries such as accounting, law, and finance. By choosing Together AI to run their eval suite they have been able to achieve high throughput and efficiency for millions of API calls, enabling them to test new models and add them to their leaderboard on the same day they're released.

  • 1< minute

    to integrate new models

  • 0

    rate limits hit for evals

  • 20M

    API calls

  • Problem needed an AI platform to run their eval suite on a variety of industries and across multiple benchmarks. They didn’t want to provision compute to host each new model themselves for testing. Instead, they wanted to use a provider that had high throughput, was reliable, and hosted several models as soon as they were available to ensure their benchmarks are kept up to date.

  • Solution

    Since has made Together AI their default provider for all their open source model evaluations, they have been able to efficiently and affordably run many evals across multiple industries. Additionally, since Together AI is so agile in incorporating new OSS models, have been able to test models like Llama-3 on the same day they are released.

  • Result has been able to run ~ 320k API calls, 200M tokens in a single day on Together AI while keeping their costs low and steady. Due to unprecedented low latency they have also been able to run evaluations very efficiently which has become one their company’s biggest value adds.

“Our ability to rapidly test new models has been significantly augmented by the Together Platform. I can integrate and evaluate new models in just a few lines of code.”

Rayan Krishnan
Founder & CEO of Vals AI

Pika Labs, a video generation company founded by two Stanford PhD students, built its text-to-video model on Together GPU Clusters. As they got traction, Pika built new iterations of the model from scratch with Together GPU Clusters, and they scaled their inference volume as they grew to millions of videos generated per month.

  • $1.1 million

    Saved over 5 months

  • 4 hours

    Time to training start

  • 392,300

    Discord users

  • Problem

    Needed efficient compute capacity that scaled from prototype to production. Having fast and efficient performance for training was a must. They needed to move quickly – they didn’t have time to worry about setting up their own training infrastructure and they needed a partner who could scale with their difficult-to-forecast traffic.

  • Solution

    Pika used Together Inference API to rapidly prototype using the easy-to-use open-source model library. Once the team decided to build their own models from the ground up, they opted for the unparalleled compute power of Together GPU Clusters. And once they launched the product and saw user traction grow exponentially, Pika scaled inference seamlessly.

  • Results

    Pika grew to millions of videos generated per month with the top users spending ~10 hours per day on the platform — all within 6 months of being founded.

“Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators.”

Demi Guo
CEO, Pika Labs

Upstage is a leading LLM company specializing in customized, domain-specific models, and the builder of top-ranked models like Solar. With Together Inference, they were able to make their Solar model available to a wide audience including Together API customers, users, and their own customers.

  • 2.8 million

    peak token volume per hour

  • 45 tokens per second

    Together AI TPS for SOLAR v0 (70B)

  • Problem

    Upstage needed to host Solar, their most popular LLM, so that it could be used by the widest possible audience. When the model charted on the Hugging Face Open LLM Leaderboard, they also needed a place that could scale to handle high traffic while maintaining fast performance and cost efficiency.

  • Solution

    Upstage chose Together Inference serverless endpoints to host their model because of the user-friendly interface of the API, its competitive pricing, and Together AI’s expert support that made bringup super easy.

  • Result

    The Solar model was deployed on Together Inference, and published on Together Inference easily scaled to serve over 2.8 million peak tokens per hour with exceptional performance — over 45 tokens per second. The Upstage team expanded their partnership and integrating Together AI into their own service.

"We chose Together AI for their competitive pricing, user-friendly interface, and quick service. Truly, it offers an exceptional service experience. I was particularly impressed when their CEO, Vipul, personally jumped in to help with technical questions."

Sung Kim
CEO of Upstage AI

Wordware, founded by Cambridge University ML experts Robert Chandler and Filip Kozera, enables seamless collaboration between domain experts and engineers, emphasizing a 'prompt first' approach to building LLM applications. This unique method helps create diverse AI-powered experiences, ranging from simple workflows to intricate agents.

  • 4 models

    Integrated into Wordware's platform

  • 16x

    Cost reduction for AI-powered NPCs

  • 3-4 Hours

    Time to integrate multiple models

  • Wordware's mission is to enhance the machine learning workflow by removing the dependency on extensive 'ground truth' datasets. Their platform empowers domain experts to quickly refine prompts, improving collaboration and speeding up iterations. Wordware wanted to focus on building the best collaborative web-based IDE for language model programming with seamless model selection and not on the hassle of managing expensive infrastructure. 

  • Wordware adopted Together's infrastructure for its versatility and user-friendly interface. The ability to rapidly prototype and scale using Together's Inference API and the powerful compute capabilities of the service was integral to their progress. The platform's low latency, minimal cold start times, and cost-effectiveness allowed Wordware to experiment with various models, enabling their customers to transition from GPT-4 to Mistral, leading to significant cost reductions, enhanced reliability and reduced latency.

  • Wordware's innovative approach has led to groundbreaking applications. One notable customer example is the development of AI-powered NPC interactions, in which the cost of operation was reduced by 16x after transitioning to Wordware. This efficiency is attributed to Wordware's token-based pricing and the ability to integrate multiple models seamlessly, like Mistral and OpenChat, offering a unique balance of speed, flexibility, and cost-effectiveness, which Wordware attributes to Together’s API.

"I love the flexibility Together AI provides, from serverless inference endpoints to easy fine-tuning and hosted deployments. We like working with a company who knows what they’re doing. With Together AI, downtime is low and throughput is amazing. That matters so much for us and our end-customers.”

Robert Chandler
Co-Founder of Wordware

Nexusflow, a leader in generative AI solutions for cybersecurity, relies on Together GPU Clusters to build robust cybersecurity models as they democratize cyber intelligence with AI.

  • 40%

    Cost savings per month

  • <90 minutes

    Onboarding time

  • Zero


  • Problem

    To enhance the capabilities of existing base models with public data, Nexusflow required a cost-effective, reliable, and scalable compute partner. Traditional cloud providers were not able to simultaneously offer the cost-efficiency and the level of guaranteed availability that Nexusflow needed to scale their specialized workloads.

  • Solution

    The team at Nexusflow opted for Together GPU Clusters, seeing it as the perfect "trifecta" in terms of contract length, pricing, and compute availability. They utilized GPUs suitable for their specific workload requirements, and benefited from the unparalleled support that Together’s expert team offers.

  • Results

    Nexusflow completed the onboarding process in <90 minutes and was able to run workloads. Initial hiccups were resolved by Together's support team, ensuring a smooth experience. Nexusflow managed to cut their R&D cloud compute costs by 40%, while experiencing faster response times and lower latency in technical support than other cloud providers.

“In an industry where time and specialized capabilities can mean the difference between vulnerability and security, Together GPU Clusters has helped us scale compute resources quickly in a cost-effective way. Their high-performance infra and top-notch support lets us focus on building state-of-the-art generative AI solutions for cybersecurity."

Jian Zhang
CTO of Nexusflow

Arcee is a growing start up in the LLM space building domain adaptive language models for organizations, and they are using Together Custom Models to fine-tune a model with a domain specific dataset.

  • 40B tokens

    Used to fine tune

  • 7B


  • Problem

    Arcee was looking for more reliable, factual systems that are also cost effective to build domain adaptive language models for Arcee’s customers.

  • Solution

    Arcee made a strategic decision to build a fine-tuned model with Together AI for several compelling reasons: the accessibility of Together API, the quality of the Together AI team, and their commitment to build a good model not just as a technical provider but as a collaborative partner.

  • Results

    Arcee built their model using Together Custom Models including domain-specific data. To optimize the quality of the model, it was trained with a data mixture optimized using DoReMi, an algorithm for finding the optimal mixture of language datasets using Distributionally Robust Optimization.

"Our relationship with Together AI has yielded remarkable achievements, including state-of-the-art models. These models are specialized, grounded, and laser-focused on specific verticals and use cases. Working with Together AI helped us dramatically accelerate development."

Mark McQuade
CEO of Arcee

Why open-source

Open source models are best choice for your company. They are faster, more customizable, and more private.


    These models were developed by research communities at leading institutions across the globe including Google, Meta, Open AI, and Stanford. With these models, you’ll get high accuracy, fast performance, and the ability to fine-tune the model to your specific needs.

    Explore 100+ models

    When you take a cutting-edge open-source model from Together AI and train it with your own private data, you’ll create a fine-tuned model that is completely yours – a private, proprietary tool that your company owns. Together AI enables you to do this in a fully private manner on Together Cloud, or in your existing Virtual Private Cloud. This means none of your private data is exposed to the world or used to improve someone else’s model.

  • transparency

    This puts you and your developers in the driver’s seat. And it enables you to show your model review board, security team, and executives everything they need to green light deploying generative AI in your application. At Together AI, we care about transparency so that we can give you more control. Let us help you understand the powerful tools at your fingertips.

    Contact us
  • control

    You can fully fine-tune any open-source model. You can adjust every layer in the model. You don’t have to update the model on someone else’s schedule. You control what and when you deploy. Your developers will thank you.

Industries & use cases

Speed up your business processes, organize millions of documents, forecast demand for products, develop a conversational chatbot for your sales team — and so much more.

Harness the power of AI applications that are customized to you.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Defect detection

Boost quality control in a production process -- automate visual inspection by identifying missing components using computer vision.


Text and data extraction

Extract and collate critical information from millions of documents at high speed.

Software & Technology
Finance & Insurance

Sentiment analysis

Understand the sentiment of words, sentences, paragraphs, or documents. Tune to your subject matter and language style for a high degree of precision.

Software & Technology
Finance & Insurance

News analysis

Pull names, events, and more from news so you can drive insights and make decisions.

Media & Design
Software & Technology
Finance & Insurance

Machine condition detection

Assess the condition of your machines through sensor data.


Image and video analysis

Automate editing workflows, catalog your assets and extract meaning from your images and videos.

Media & Design
Software & Technology
Finance & Insurance

Forecast business metrics

Create prediction models to forecast your business needs using your data.

Software & Technology
Finance & Insurance

Interaction analytics

Remove friction and improve customer journeys with deeper understanding of interactions across channels.

Media & Design
Software & Technology
Finance & Insurance


Generate creative starting points for books, movies, or other media. Leverage an AI co-pilot to help with editing scripts and creating a consistent tone.

Media & Design

Text to speech

Generate high quality, natural speech from any text.

Media & Design
Software & Technology
Finance & Insurance

Document intelligence

Identify, extract and organize custom data from complex documents to reduce manual operations and improve workflows. Extract clauses, dates, parties, and other custom entities from documents with ease.

Media & Design
Software & Technology
Finance & Insurance

Text and speech translation

Automatically translate text or speech between over 100 languages.

Media & Design
Software & Technology
Finance & Insurance

Insights and analysis

Extract understanding and insights from unstructured text, and output in a structured form for use in a variety of formats (bullets, tables, sentences, or JSON).

No items found.

Product summarization

Automate product titles and descriptions at scale, customizing to different regions or audiences to maximize engagement and SEO.

Software & Technology
Finance & Insurance

Image, video, and audio generation

Generate high quaility images, video, and audio from text prompts.

Media & Design


Enhance the user experience by customizing content to each individual user.

Media & Design
Software & Technology
Finance & Insurance

Data Audit

Detect and identify root causes of unexpected changes in metrics such as revenue and retention.

Software & Technology
Finance & Insurance

Code generation and understanding

Understand code in dozens of languages, summarize check-ins, identify bugs or issues, and automate code review processes.

Software & Technology

Named entity recognition

Identify and extract known entities from bodies of text efficiently and accurately.

No items found.

Customized document classification

Improve document classification by using features unique to your data.

Software & Technology
Finance & Insurance

Chatbots & virtual agents

Communicate with your end users 24x7 with natural language. Add an intelligent conversational layer to any application-- customer support, sales, internal devops, legal assistant, coding co-pilot, social chat and more. Easily extend to email, chat, and voice applications.

Media & Design
Software & Technology
Finance & Insurance


Efficiently summarize a few paragraphs -- or whole documents.

Media & Design
Software & Technology
Finance & Insurance

Automatic speech recognition

Add a voice interface to any product interface or feature to allow your users to interact with your application more efficiently or in new modalities, e.g. driving, hands free.

Media & Design
Software & Technology
Finance & Insurance
here →