This website uses cookies to anonymously analyze website traffic using Google Analytics.
Company

Together AI and NVIDIA collaborate to power Llama 3.1 models for enterprises on NVIDIA DGX Cloud

July 23, 2024

By 

Together AI

We are thrilled to announce our collaboration with NVIDIA that brings the industry-leading Together Inference Engine to NVIDIA AI Foundry customers. This empowers enterprises and developers to leverage openly available models like Llama 3.1 running on the Together Inference Engine on NVIDIA DGX Cloud.

As a leader in inference optimization research, the Together Inference Engine is built on innovations including FlashAttention-3 kernels, custom-built speculators based on RedPajama, and the most accurate quantization techniques available on the market. These advancements enable enterprise workloads to be highly optimized for NVIDIA Tensor Core GPUs allowing them to build and run generative AI applications on open source models with unmatched performance, accuracy, and cost-efficiency at production scale.

"Enterprises want to leverage the power of openly available AI models like Llama 3.1, customized to their specific needs," said Alexis Bjorlin, vice president of DGX Cloud at NVIDIA. "By collaborating with Together AI, we're introducing the highly optimized Together Inference Engine to DGX Cloud, offering companies efficient and scalable AI inference capabilities.” 

With this collaboration, NVIDIA AI Foundry customers can run Together Inference on NVIDIA DGX Cloud for access to the latest NVIDIA AI architecture, optimized at every layer for faster deployment. Enterprises can also fine-tune the models with their proprietary data to achieve higher accuracy and performance and continue to maintain ownership of their data and models. The over 100,000 developers and enterprises using Together API will also now have the option to deploy endpoints with the highest performance, scalability, and security on NVIDIA DGX Cloud.

Today marks an inflection point for open source AI with the launch of Llama 3.1 405B, the largest openly available foundation model. It offers unmatched flexibility, control, and state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation that rival the best closed source models in AI, while providing safety tools for responsible development. These advancements will rapidly accelerate the adoption of open-source AI with developers and enterprises.

At Together AI, we believe the future of generative AI depends on open research, open science, and trust between researchers, developers, and enterprises.

Our vision is to bring innovations from research to production the fastest. Our team has invented methods like FlashAttention 3, Mixture of Agents, Medusa, Sequoia, Hyena, Mamba, and CocktailSGD at the intersection of AI and systems research leading to faster velocity, faster time to market, and providing cutting-edge benefits to customers.

As the launch partner for the Llama 3.1 models, we're thrilled for customers to leverage the best performance, accuracy, and cost for their Generative AI workloads on the Together Inference Engine 2 while allowing them to keep ownership of their models and their data secure.

Today, enterprises like Zomato, DuckDuckGo, and the Washington Post build and run their generative AI applications on Together Inference.

Now, with this collaboration, enterprises with sophisticated workloads on DGX Cloud can deploy open-source models into production faster on NVIDIA-optimized infrastructure, paired with the Together AI accelerated inference stack with unmatched performance, scalability and security.

  • Lower
    Cost
    20%
  • faster
    training
    4x
  • network
    compression
    117x

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Get started today!

Run Together Inference on NVIDIA DGX Cloud for instant access to the latest NVIDIA AI architecture, optimized at every layer for faster deployment

Start
building
yours
here →