Products Overview

Together Products

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

Inference

Serverless Inference

The fastest way to run open-source models on demand. Powered by cutting-edge inference research. No infrastructure to manage, no long-term commitments.

Batch Inference

Cost-effectively process massive workloads asynchronously. Scale to 30 billion tokens per model with any serverless model or private deployment.

Dedicated Model Inference

Deploy models on dedicated infrastructure. Purpose-built for teams who need speed, control, and the best economics in the market.

Dedicated Container Inference

GPU infrastructure purpose-built for generative media workloads. Deploy video, audio, and image models with performance acceleration powered by Together Research.

Compute

Accelerated Compute

Scale from self-serve instant clusters to thousands of GPUs, all optimized for better performance with Together Kernel Collection.

Sandbox

Use fast, secure code sandboxes at scale to set up full-scale development environments for AI apps and agents.

Managed Storage

High-performance managed storage for AI-native workloads. Object storage and parallel filesystems optimized for AI, with zero egress fees.

Model Shaping

Fine-Tuning

Fine-tune open-source models for production workloads, using the latest research techniques. Improve accuracy, reduce hallucinations, and control behavior — without managing training infrastructure.