This website uses cookies to anonymously analyze website traffic using Google Analytics.
Company

Together AI launches full stack for developers to build with open-source AI

July 14, 2023

By 

Together

Introducing Together API and Together Compute — simple, powerful, cost effective cloud services to train, fine-tune, and run the world’s leading open-source AI models

We’re in the middle of an AI revolution that will impact nearly every aspect of society.

Whether this AI is open and accessible will shape the pace and direction of innovation for decades to come. Most of today’s leading generative AI models are closed behind commercial APIs—limiting companies and researchers from inspecting, understanding, and customizing models for their needs. At the same time, training large custom models is complicated, expensive, and time consuming, requiring significant AI expertise and management of large-scale infrastructure.

Today, we’re excited to announce two products to help change this: Together API and Together Compute.

These cloud services offer a full stack solution for AI developers to train, fine-tune, and run the world’s leading open-source AI models. We currently host more than 50 models, including RedPajama, LLaMA, Falcon, and Stable Diffusion.

Together API: With an easy-to-use fine-tuning API, powered by one of the most efficient distributed training systems, Together API makes fine-tuning large AI models simple and fast. It also offers optimized, private API endpoints for low-latency inference. Deploy your first model in seconds.

Together Compute: For AI/ML research groups who want to pre-train models on their own datasets, we offer clusters of high-end GPUs paired with our distributed training stack. Together AI’s research team is behind breakthroughs like FlashAttention, FlexGen, and CocktailSGD that are core to modern optimization—making Together Compute the most cost effective way to build new models with supercharged speed. Reserve a training cluster.

One of our goals is making AI accessible by radically reducing costs.

Generative AI models have billions of parameters, and every token flows through all of these parameters, which makes these models expensive in production. Hosting them for inference on hyperscalers can cost $4-6 per hour per A100 GPU. A typical cloud instance with NVIDIA’s 8xA100 cards costs 25,000 a month to operate.  

We are achieving and offering significant reductions in cost of interactive inference workloads on large models. We optimize down the stack, with thousands of GPUs located in multiple secure facilities, software for virtualization, scheduling, and model optimizations that significantly bring down operating costs.

Our A100-based inference VMs are offered as low as $0.11 per hour for inference, and our users can often pay 1/5th of what it costs on hyperscalers to train and fine-tune models.

We believe AI is having its Linux moment—and we’re proud to be part of it.

In April, along with collaborators, we released RedPajama, a set of leading open-source models and the largest-ever open pre-training dataset. The RedPajama dataset has been used to train over 200 models! We plan to keep engaging, building, and releasing models, datasets, and research in the open.

In May, we announced our seed funding led by Lux Capital to build a cloud platform for open-source AI. Today we are thrilled to share the first version of that platform with you and excited to see what you build!

Read the API Docs, and join us in #api on our Discord.

See you there!

Editorial note: Together Compute was renamed to Together GPU Clusters on November 5th 2023.

  • Lower
    Cost
    20%
  • faster
    training
    4x
  • network
    compression
    117x

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Start
building
yours
here →