together.
The AI Acceleration Cloud
Train, fine-tune, and run generative AI models faster, at lower cost, and at production scale.
Trusted by
Together Inference
Best combination of performance, accuracy & cost at production scale so you don't have to compromise.
SPEED RELATIVE TO VLLM
LLAMA-3 8B AT FULL PRECISION
COST RELATIVE TO GPT-4o
Why Together Inference
Cutting edge research, models that fit your needs, and flexible deployment options.
Powered by cutting edge research
FlashAttention-3: 75% MFUs on H100s
Fastest kernels for Nvidia GPUs: Optimized MHA & GEMMs implementation
Speculative decoding: 10x Chinchilla optimal speculator leveraging Medusa and SpecExec innovations
Flexibility to choose a model that fits your needs
Reference: Full precision, available for 100% accuracy
Turbo: Best performance without losing accuracy
Lite: Optimized for fast performance at the lowest cost
Available via Serverless and Dedicated Instances
Serverless: Scale seamlessly with 100+ models
Dedicated Instances: Reserved monthly & on-demand
Together Fine-tuning
Fine-tune leading open-source models with your data to achieve greater accuracy for your tasks.
Together GPU Clusters
Get your own private GPU cluster – with hundreds or thousands of interconnected NVIDIA GPUs – for large training and fine-tuning today.
Use our purpose built training clusters with H100, H200, and A100 GPUs connected over fast Infiniband networks. Your cluster comes optimized for distributed training with the accelerated Together Kernel Collection the box. You focus on your model, and we’ll ensure everything runs smoothly.
01
We offer flexible terms – even with our highest quality hardware. You can commit to just a month or reserve capacity for up to 5 years.
02
H100 Clusters Node Specifications:
- 8x Nvidia H100 / 80GB / SXM5
- 3.2 Tbps Infiniband network
- 2x AMD EPYC 9474F 18 Cores 96 Threads 3.6GHz CPUs
- 1.5TB ECC DDR5 Memory- 8x 3.84TB NVMe SSDs
A100 SXM Clusters Node Specifications:
- 8x NVIDIA A100 80GB SXM
- 4120 vCPU Intel Xeon (Sapphire Rapids)
- 960 GB RAM
- 8 x 960GB NVMe storage
- 200 Gbps Ethernet or 3200 Gbps Infiniband configs available03
We value your time. Clusters are pre-configured for high-speed distributed training, using Slurm and the Together Custom Models stack to get you up and running at lightspeed.
Training-ready clusters – H100, H200, or A100
THE FASTEST CLOUD FOR GEN AI.
BUILT ON LEADING AI RESEARCH.
Innovations
Our research team is behind breakthrough AI models, datasets, and optimizations.
Build, deploy, and scale. All in a single platform.
- 01
Build
Whether prompt engineering, fine-tuning, or training, we are ready to meet your business demands.
- 02
Deploy
Easily integrate your new model into your production application using the Together Inference API.
- 03
Scale
With the fastest performance available and elastic scaling, Together AI is built to scale with your needs as you grow.
Customer Stories
See how we support leading teams around the world. Our customers are creating innovative generative AI applications, faster.