together.pricing
Pricing that scales from idea to production
Build
Get started with fast inference, reliability, and no daily rate limits
Get startedIncludes:
Free Llama Vision 11B + FLUX.1 [schnell]
$1 credit for all other models
Fully pay as you go, and easily add credits
No daily rate limits, up to 6000 requests and 2M tokens per minute for LLMs
Deploy on-demand dedicated endpoints (no rate limits)
Monitoring dashboard with 24-hr data
Email and in-app chat support
sCALE
Scale production traffic, with reserved GPUs, and advanced config
Contact salesIncludes everything in Build plus
Up to 9,000 requests per minute and 5M tokens per minute for LLMs
Premium support
Support via private slack channel
Monitoring dashboard with 30-day data (coming soon!)
Discounts on monthly reserved dedicated GPU
Advanced dedicated endpoint configuration
99% availability dedicated endpoints SLA
HIPAA compliance
eNTERPRISE
Private deployments and model optimization at scale
Contact salesIncludes everything in scale plus
Custom rate limits and no token limits
VPC and on-prem deployments
Enterprise grade security & compliance
Monitoring dashboard with 1 year data (coming soon!)
Continuous model optimization
Dedicated success representative
99.9% dedicated endpoints SLA with geo redundancy
Priority access to hardware including H100 & H200 GPUs
Custom regions
Inference pricing
Over 100 leading open-source Chat, Multimodal, Language, Image, Code, and Embedding models are available through the Together Inference API. For these models you pay just for what you use.
Serverless Endpoints
Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.
Llama 3.2, LLAMA 3.1, LLama 3 MODELS
MODEL SIZE
type
LITE
TURBO
REFERENCE
3B
Text
LITE
TURBO
$0.06
REFERENCE
8B
Text
LITE
$0.10
TURBO
$0.18
REFERENCE
$0.20
11B
Vision
LITE
TURBO
$0.18
REFERENCE
70B
Text
LITE
$0.54
TURBO
$0.88
REFERENCE
$0.90
90B
Vision
LITE
TURBO
$1.20
REFERENCE
405B
Text
TURBO
$3.50
Qwen 2.5 models
Model size
price 1M tokens
7B
price 1M tokens
$0.30
72B
price 1M tokens
$1.20
ALL OTHER CHat, language, code and moderation models
Model size
price 1M tokens
Up to 4B
price 1M tokens
$0.10
4.1B - 8B
price 1M tokens
$0.20
8.1B - 21B
price 1M tokens
$0.30
21.1B - 41B
price 1M tokens
$0.80
41.1B - 80B
price 1M tokens
$0.90
80.1B - 110B
price 1M tokens
$1.80
Mixture-of-experts
Model size
price 1M tokens
Up to 56B total parameters
price 1M tokens
$0.60
56.1B - 176B total parameters
price 1M tokens
$1.20
176.1B - 480B total parameters
price 1M tokens
$2.40
FLUX Image models
Model
PRICE PER MP
IMAGES per $1 (1MP)
FLUX.1 [schnell]
PRICE PER MP
$0.0027
IMAGES per $1 (1MP)
370
FLUX1.1 [pro]
PRICE PER MP
$0.04
IMAGES per $1 (1MP)
25
FLUX.1 [pro]
PRICE PER MP
$0.05
IMAGES per $1 (1MP)
20
STABILITY IMAGE MODELS
Image Size
25 steps
50 steps
75 steps
100 steps
512X512
25 steps
$0.001
50 steps
$0.002
75 steps
$0.0035
100 steps
$0.005
1024X1024
25 steps
$0.01
50 steps
$0.02
75 steps
$0.035
100 steps
$0.05
EMbeddings models
Model size
price 1M tokens
Up to 150M
price 1M tokens
$0.008
151M - 350M
price 1M tokens
$0.016
rerank models
Model size
price 1M tokens
8B
price 1M tokens
$0.10
Dedicated endpoints
When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Together Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.
Your fine-tuned models
hardware type
price per MINUTE HOSTed
1x RTX-6000 48GB
$0.034
1x L40 48GB
$0.034
1x L40S 48GB
$0.048
1x A100 PCIe 80GB
$0.050
1x A100 SXM 40GB
$0.050
1x A100 SXM 80GB
$0.054
1x H100 80GB
$0.098
Interested in a dedicated endpoint for your own model?
Fine-tuning pricing
Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.
Download checkpoints and final model weights.
View job status and logs through CLI or Playgrounds.
Deploy a model instantly once it’s fine-tuned.
Try the interactive calculator
Together GPU Clusters Pricing
Together Compute provides private, state of the art clusters with H100, H200, and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.
haRDWARE TYPES AVAILABLE
NETWORKING
pricing
A100 PCIe 80GB
200 Gbps non-blocking Ethernet
Starting at $1.30/hr
A100 SXM 80GB
200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available
Starting at $1.30/hr
H100 80GB
3.2 Tbps Infiniband
Starting at $1.99/hr
H200 141GB
3.2 Tbps Infiniband
Starting at $2.20/hr