Company

Together AI Powers Pioneers at GTC: NVIDIA Blackwell GPUs, Instant GPU Clusters, and A Full-Stack for AI Innovation

March 18, 2025

・

Together AI

This week, as an NVIDIA GTC Gold Sponsor, we're taking the wraps off our latest advancements to Together GPU Clusters.

We’re rapidly deploying NVIDIA Blackwell GPUs at massive scale, bringing next-generation AI performance to our customers. Early adopters—including Zoom, Salesforce, and fal.ai—have test driven NVIDIA HGX B200 on Together AI, already seeing an impressive ~2x speed-up in training and inference performance relative to NVIDIA HGX H100—with additional acceleration coming soon.

But that’s not all. Today, we’re excited to introduce the Preview release of Together Instant GPU Clusters—on-demand clusters of up to 64 NVIDIA GPUs, interconnected via NVIDIA Quantum-2 InfiniBand and NVIDIA NVLink™. Fully self-service, these clusters can be provisioned via our console in just minutes, unlocking powerful AI infrastructure at unprecedented speed and scale.

Surrounding the event in San Jose—and stretching all the way up the Caltrain corridor to San Francisco—is a celebration of NVIDIA Blackwell Platform’s arrival on Together AI, and of our customers’ success: AI pioneers like Cartesia, Pika and DuckDuckGo are leveraging Together AI to forge the AI frontier, pushing the boundaries of what's possible with AI infrastructure.

Our customer success is a testament to how our vision for purpose-built and truly open AI infrastructure resonates with both AI-native startups and established enterprise companies, ensuring they have the flexibility and performance needed to push AI forward. We believe enterprises should own their AI—gaining full control over their models, data, and accelerated infrastructure without the constraints of closed AI platforms, or the performance bottlenecks that occur with other cloud platforms. This is why we provide both GPU Clusters accelerated by NVIDIA to facilitate training custom models from scratch, as well as our Inference & Fine-Tuning Platform, which features more than 200 of the most popular open-source models. Together AI is the fastest platform for DeepSeek-R1, and today Together AI is launching the ability to deploy models powered by NVIDIA NIM to Together AI directly from build.nvidia.com.

📍 If you're attending NVIDIA GTC in person, we would love to discuss our latest products, and how they can accelerate your AI initiatives: come find us at booth 1332. We would also love for you to attend our GTC theater talk, Accelerating AI with Purpose-Built Cloud Infrastructure, from our founding VP Engineering Charles Srisuwananukorn. For those of you at home, we invite you to enjoy this online panel discussion regarding our inference platform, featuring our CEO Vipul Ved Prakash, our Chief Scientist Tri Dao, and our VP Research Leon Song.

Together GPU Clusters: AI Compute at Any Scale

Building on our deep expertise in AI infrastructure, Together GPU Clusters provide a scalable, high-performance NVIDIA-accelerated inference platform designed to meet the needs of AI developers at every stage. As an NVIDIA Cloud Partner, Together AI delivers all GPU clusters using the latest NVIDIA Cloud Partner Reference Architecture, ensuring optimized software, high-performance networking, and enterprise-grade reliability.

This unified platform provides three distinct options, each tailored to different levels of AI compute demand:

Instant GPU Clusters of up to 64 NVIDIA GPUs - now in Preview – self-service up to 64 GPU clusters available to users in minutes, for fast iteration and immediate compute needs.
Dedicated GPU Clusters of up to 1,000 NVIDIA GPUs – Exclusive, high-performance GPU clusters ranging up to 1,000 GPUs, optimized for large-scale training and inference.
‍Custom GPU Clusters from 1,000 → 10,000 → 100,000+ NVIDIA GPUs – Hyperscale deployments from GPUs, designed specifically for your AI supercomputing projects at massive scale.

1️⃣ Instant GPU Clusters: Up to 64 NVIDIA GPUs, Entirely Self-Service

Today, we're announcing the Preview release of together.ai/instant, with new self-service provisioning. These clusters provide up to 64 NVIDIA GPUs (80GB SXM) per deployment, designed for rapid access and high-performance workloads, ideal for AI teams that need to handle burst compute demands, validate models before committing to long-term infrastructure, and bypass lengthy procurement processes. Unlike traditional cloud instances, Together Instant GPU Clusters feature NVIDIA Quantum-2 InfiniBand and NVLink interconnects, ensuring ultra-low-latency, high-throughput performance. AI teams can access high-bandwidth, fully interconnected multi-GPU clusters in minutes—perfect for distributed training, fine-tuning, and inference workloads at scale.

These clusters are fully self-service via the Together AI console, allowing users to configure Kubernetes or Slurm environments instantly, with no long-term commitments. Every cluster is delivered to users in a fully configured and tuned state, ready for optimal performance. This flexibility enables AI teams to quickly validate model performance, handle burst compute needs, and iterate faster than ever before.

For a deeper dive into how Together Instant GPU Clusters accelerated by NVIDIA Inference platform work, including technical details and a product walkthrough, check out our full blog post.

⚡ Request Access to Together Instant GPU Clusters - Up to 64 NVIDIA GPUs
Now in Preview: together.ai/instant

2️⃣ Dedicated GPU Clusters: 64 – 1,000 NVIDIA GPUs

For organizations requiring exclusive, high-performance compute infrastructure, Together AI offers Dedicated Together GPU Clusters at scale, ranging from 64 GPUs to thousands of NVIDIA Blackwell and Hopper GPUs. These clusters deliver custom-configured, high-density AI compute optimized for training, reinforcement learning, and large-scale inference.

Together AI is working closely with Hypertec to deploy these clusters of thousands of NVIDIA Blackwell GPUs, building out some of the world’s most advanced AI compute infrastructure. To showcase this effort, we’ve captured an exclusive behind-the-scenes video of these Blackwell GPUs being installed in our latest AI supercomputing deployments.

Dedicated Clusters come with the Together Kernel Collection, a suite of custom GPU kernels optimized for AI workloads. Recently, Together AI demonstrated a 90% increase in training throughput for a 70-billion-parameter model, achieving 15,200 tokens per second per node on NVIDIA HGX B200 systems. These improvements are made possible by custom FP8 kernels optimized for Blackwell’s 5th-generation Tensor Cores, developed using the open-source ThunderKittens framework.

Key Capabilities:

NVIDIA Blackwell GPUs: NVIDIA GB200 NVL72 and NVIDIA HGX B200
High-bandwidth, ultra-low-latency interconnects—NVIDIA Quantum-2 InfiniBand, NVIDIA NVLink, and high-speed networking ensure seamless performance for distributed training.
Optimized infrastructure—deployed in AI-optimized data centers, leveraging advanced liquid cooling and power-efficient architectures.
Accelerated model training—Together Kernel Collection delivers higher efficiency and faster training speeds, unlocking new levels of performance for AI workloads.

✅ Perfect for teams scaling AI model training cycles and running continuous high-performance workloads.

👉 Request a Together Dedicated GPU Cluster: together.ai/dedicated

3️⃣ Custom GPU Clusters: 1,000 → 10,000 → 100,000+ NVIDIA GPUs

For frontier AI workloads, we offer Together Custom Together GPU Clusters, featuring thousands of NVIDIA Blackwell GPUs, purpose-built for your project’s specific needs. Designed and optimized by AI and infrastructure experts, these clusters provide the computational backbone for next-generation AI research—powering LLM training, simulation, and enterprise-scale AI applications.

Why Choose Together Custom GPU Clusters

Bespoke AI Infrastructure, Expertly Designed: Our team works with you to architect and deploy a custom GPU cluster optimized for your unique AI workload—without the complexity of sourcing and integrating infrastructure yourself.
Unmatched Availability & Delivery Speed: Unlike traditional cloud procurement or on-premise deployments, we deliver ultra-large AI clusters on aggressive timelines, ensuring your research and development aren’t delayed by supply chain constraints.
Scaling to AI Factories: Our hyperscale GPU clusters serve as the foundation for AI Factories—massive-scale, fully integrated compute environments built for next-generation AI breakthroughs.

Through our strategic collaboration with NVIDIA and Hypertec, we deliver best-in-class performance, cost efficiency, and operational simplicity, so you can focus on AI innovation—not infrastructure challenges.

✅ The future of AI development relies on massive-scale infrastructure, and Together AI delivers it today.

☎️ Contact us about a Together Custom GPU Cluster: together.ai/custom

Built on NVIDIA’s Accelerated Computing Platform, with Easy Deployment of NVIDIA AI Enterprise and NVIDIA NIM

NVIDIA AI Enterprise: Optimized AI Software for Performance and Reliability

Together AI offers NVIDIA AI Enterprise as part of its GPU cluster offerings, providing AI teams with a validated, production-ready software stack optimized for NVIDIA accelerated compute. This enterprise-grade AI platform includes:

Certified and optimized frameworks such as TensorFlow, PyTorch, and NVIDIA RAPIDS™ for accelerated AI development.
Enterprise-grade security, reliability, and support to ensure AI workloads run seamlessly at scale.
Full-stack software optimizations that improve model training efficiency, inference speed, and overall workload performance.

By integrating NVIDIA AI Enterprise, Together AI delivers a seamless, end-to-end AI compute experience—helping teams reduce infrastructure complexity while achieving maximum performance.

Deploy NVIDIA NIM from build.nvidia.com directly to Together AI

NVIDIA NIM, part of NVIDIA AI Enterprise, provides pre-optimized, production-ready inference microservices for the latest AI models, enabling enterprises to deploy AI applications at scale with minimal setup. Now, with direct integration into build.nvidia.com, deploying models powered by NVIDIA NIM on Together AI is easier than ever. Developers can explore, test, and move seamlessly to production with just a few clicks. Models like Nemotron-4 340B, NVIDIA's most powerful language model, Llama Nemotron, which enhances Meta Llama with superior reasoning capabilities, and NVIDIA NeMo™ Retriever, a collection of microservices for building scalable retrieval pipelines, are all available for deployment on Together Dedicated Endpoints. With Together AI, enterprises gain full control over their AI workloads, ensuring high performance and efficiency. Read more in this blog post.

End-to-End NVIDIA Performance Enhancements

From NVIDIA Blackwell and Hopper GPUs, to networking with NVIDIA NVLink, NVSwitch, and NVIDIA Quantum-2 InfiniBand, Together AI ensures that every aspect of our infrastructure is tuned for peak AI performance.

Forging the Frontier of AI, Together

AI pioneers are pushing the boundaries of what's possible, and Together AI is providing the infrastructure to power their breakthroughs. With today's announcement of Together Instant GPU Clusters and our ongoing deployment of NVIDIA Blackwell GPUs at massive scale, we’re equipping innovators with the compute resources needed to tackle the most ambitious AI challenges. Whether you're advancing the next generation of reasoning models, scaling AI research, or deploying mission-critical AI systems, Together AI ensures you have the infrastructure to push AI forward. The frontier of AI is being forged now—and we’re proud to power those leading the way.

⚡ Request Access to Together Instant GPU Clusters - Up to 64 NVIDIA GPUs
Now in Preview: together.ai/instant

👉 Request a Together Dedicated GPU Cluster: together.ai/dedicated

☎️ Contact us about a Together Custom GPU Cluster: together.ai/custom

📍 Join us at NVIDIA GTC (Booth #1332) to see how AI pioneers are building the future on Together AI. Let’s shape the next era of AI together.

Lower
Cost
20%
faster
training
4x
network
compression
117x

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Links in this
article