Together AI - Pricing

Serverless Inference

Text & Vision Models

State-of-the-art language and multimodal models.

Price 1M tokens

Batch API price

Model	Input	Output
Llama 4 Maverick Llama	$0.27	$0.85
Llama 4 Scout Llama	$0.18	$0.59
Llama 3.3 70B Instruct-Turbo Llama	$0.88	$0.88
Llama 3.2 3B Instruct Turbo Llama	$0.06	$0.06
Llama 3.1 405B Instruct Turbo Llama	$3.50	$3.50
Llama 3.1 70B Instruct Turbo Llama	$0.88	$0.88
Llama 3.1 8B Instruct Turbo Llama	$0.18	$0.18
Llama 3 8B Instruct Lite Llama	$0.10	$0.10
Llama 3 70B Instruct Reference Llama	$0.88	$0.88
Llama 3 70B Instruct Turbo Llama	$0.88	$0.88
LLaMA-2 Llama	$0.90	$0.90
DeepSeek-R1 DeepSeek	$3.00	$7.00
DeepSeek R1 Distilled Qwen 14B DeepSeek	$0.18	$0.18
DeepSeek R1 Distilled Llama 70B DeepSeek	$2.00	$2.00
DeepSeek R1-0528-tput DeepSeek	$0.55	$2.19
DeepSeek-V3-1 DeepSeek	$0.60	$1.70
DeepSeek-V3 DeepSeek	$1.25	$1.25
gpt-oss-120B OpenAI	$0.15	$0.60
gpt-oss-20B OpenAI	$0.05	$0.20
Qwen3 Next 80B A3B Instruct Qwen	$0.15	$1.50
Qwen3 Next 80B A3B Thinking Qwen	$0.15	$1.50
Qwen3-Coder 480B A35B Instruct Qwen	$2.00	$2.00
Qwen3 235B A22B Instruct 2507 FP8 Qwen	$0.20	$0.60
Qwen3 235B A22B Thinking 2507 FP8 Qwen	$0.65	$3.00
Qwen3 235B A22B FP8 Throughput Qwen	$0.20	$0.60
Qwen 2.5 72B Qwen	$1.20	$1.20
Qwen2.5-VL 72B Instruct Qwen	$1.95	$8
Qwen2.5 Coder 32B Instruct Qwen	$0.80	$0.80
Qwen2.5 7B Instruct Turbo Qwen	$0.30	$0.30
Qwen QwQ-32B Qwen	$1.20	$1.20
GLM-4.6 GLM	$0.60	$2.20
GLM-4.5-Air GLM	$0.20	$1.10
Kimi K2 Instruct Kimi	$1.00	$3.00
Kimi K2 Thinking Kimi	$1.20	$4.00
Kimi K2 0905 Kimi	$1.00	$3.00
Mistral (7B) Instruct v0.2 Mistral	$0.20	$0.20
Mistral Instruct Mistral	$0.20	$0.20
Mistral Small 3 Mistral	$0.80	$0.80
Mixtral 8x7B Instruct v0.1 Mistral	$0.60	$0.60
Marin 8B Instruct Other	$0.18	$0.18
Arcee AI AFM-4.5B Other	$0.10	$0.40
Arcee AI Coder-Large Other	$0.50	$0.80
Arcee AI Maestro Other	$0.90	$3.30
Arcee AI Virtuoso-Large Other	$0.75	$1.20
Cogito v2 preview - 109B MoE Other	$0.18	$0.59
Cogito v2 preview - 405B Other	$3.50	$3.50
Cogito v2 preview - 671B MoE Other	$1.25	$1.25
Cogito v2 preview - 70B Other	$0.88	$0.88
Refuel LLM-2 Other	$0.60	$0.60
Refuel LLM-2 Small Other	$0.20	$0.20
Typhoon 2 70B Instruct Other	$0.88	$0.88
gemma-3n-E4B-it Other	$0.02	$0.04

Looks like there are no models for this filter.

Image Models

Generate stunning visuals with the latest and greatest image models.

Price per MP

Model	Input	Images Per $1 (1MP)	Default steps
FLUX.1 Krea [dev]	$0.025	40	28
FLUX.1 Kontext [dev]	$0.025	40	28
FLUX.1 Kontext [pro]	$0.04	25	28
FLUX.1 Kontext [max]	$0.08	12.5	28
FLUX1.1 [pro]	$0.04	25	-
FLUX.1 [dev]	$0.025	40	28
FLUX.1 [pro]	$0.05	20	28
FLUX.1 [schnell]	$0.0027	370	4
FLUX.1 Canny [pro]	$0.05	20	-
Google Imagen 4.0 Preview	$0.04	25	-
Google Imagen 4.0 Fast	$0.02	50	-
Google Imagen 4.0 Ultra	$0.06	16.6	-
Gemini Flash Image 2.5 (Nano Banana)	$0.039	25.6	-
ByteDance Seedream 3.0	$0.018	55.5	-
ByteDance Seedream 4.0	$0.03	33.3	-
ByteDance SeedEdit	$0.03	33.3	-
Qwen Image Edit	$0.0032	312.5	-
Qwen Image	$0.0058	172.4	-
Juggernaut Pro Flux by RunDiffusion	$0.0049	204	-
Juggernaut Lightning Flux by RunDiffusion	$0.0017	588.2	-
HiDream-I1-Full	$0.009	111.1	-
HiDream-I1-Dev	$0.0045	222.2	-
HiDream-I1-Fast	$0.0032	312.5	-
Ideogram 3.0	$0.06	16.6	-
Dreamshaper	$0.0006	1 666.6	-
SD XL	$0.0019	526.3	-
Stable Diffusion 3	$0.0019	526.3	-

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Audio Models

Speech synthesis and processing models.

Price per 1M Characters

Model	Price
Cartesia Sonic-2	$65.00

Video Models

Use our video generation API to create high-quality videos.

Price per video

Model	Price
MiniMax 01 Director (720p/5s)	$0.28
MiniMax Hailuo 02 (768p/10s)	$0.56
MiniMax Hailuo 02 (1080p/6s)	$0.49
Google Veo 2.0 (720p/5s)	$2.50
Google Veo 3.0 (720p/8s)	$1.60
Google Veo 3.0 + Audio (720p/8s with audio)	$3.20
Google Veo 3.0 Fast (1080p/8s)	$0.80
Google Veo 3.0 Fast + Audio (1080p/8s with audio)	$1.20
ByteDance Seedance 1.0 Lite (720p/5s)	$0.14
ByteDance Seedance 1.0 Pro (1080p/5s)	$0.57
PixVerse v5 (1080p/5s)	$0.30
Kling 2.1 Master (1080p/5s)	$0.92
Kling 2.1 Standard (720p/5s)	$0.18
Kling 2.1 Pro (1080p/5s)	$0.32
Kling 2.0 Master (1080p/5s)	$0.92
Kling 1.6 Standard (720p/5s)	$0.19
Kling 1.6 Pro (1080p/5s)	$0.32
Wan 2.2 I2V (720p/5s)	$0.31
Wan 2.2 T2V (720p/8s)	$0.66
Vidu 2.0 (720p/8s)	$0.28
Vidu Q1 (1080p/5s)	$0.22
Sora 2 (720p/8s)	$0.80
Sora 2 Pro (720p/8s)	$2.40
Sora 2 Pro (1080p/8s)	$4.00

Transcription Models

Models for automatic speech recognition (ASR) and speech translation.

Price per audio minute

Batch API price

Model	Price
Whisper Large v3	$0.0015

Embedding Models

Vector embeddings for semantic search and RAG.

Price 1M tokens

Model	Price
BGE-Base-EN v1.5	$0.01
BGE-Large-EN v1.5	$0.02
GTE ModernBERT base	$0.08
Multilingual e5 large instruct	$0.02
M2-BERT 80M 32K Retrieval	$0.01

Rerank Models

Improve search relevance with reranking models.

Price 1M tokens

Model	Price
Mxbai Rerank Large V2	$0.10
Salesforce Llama Rank V1 (8B)	$0.10

Moderation Models

Filter and classify content for safety and compliance.

Price 1M tokens

Model	Price
VirtueGuard Text Lite	$0.20
Llama Guard 4 12B	$0.20
Llama Guard 3 11B Vision Turbo	$0.18
Llama Guard 3 8B	$0.20
Llama Guard 2 8B	$0.20

Dedicated Endpoints

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

Guaranteed performance (no sharing)
Support for custom models
Autoscaling & traffic spike handling

Hardware Type	Price/Hour
1x H200 141GB	$4.99
1x H100 80GB	$3.36
1x A100 SXM 80GB	$2.56
1x A100 SXM 40GB	$2.40
1x A100 PCIe 80GB	$2.40
1x L40S 48GB	$2.10

Fine-tuning

Standard pricing

	Supervised Fine-Tuning		Direct Preference Optimization
Size	LoRA	Full Fine-Tuning	LoRA	Full Fine-Tuning
Up to 16B	$0.48	$0.54	$1.20	$1.35
17B-69B	$1.50	$1.65	$3.75	$4.12
70-100B	$2.90	$3.20	$7.25	$8.00

Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).

Specialized pricing

Fine-tuning for the models below incurs minimum charges and is limited to LoRA fine-tuning.

Model	Supervised Fine-Tuning (LoRA)	Direct Preference Optimization (LoRA)	Minimum charge
gpt-oss-120B	$5.00	$12.50	$6.00
Llama 4 Scout Llama 4 Scout Instruct	$3.00	$7.50	$6.00
Llama 4 Maverick Llama 4 Maverick Instruct	$8.00	$20.00	$16.00
DeepSeek-R1 DeepSeek-R1-0528 DeepSeek-V3 DeepSeek-V3-0324 DeepSeek-V3.1 DeepSeek-V3.1-Base	$10.00	$25.00	$20.00
Qwen3-Coder-480B-A35B-Instruct	$9.00	$22.50	$18.00
Qwen3-235B-A22B Qwen3-235B-A22B-Instruct-2507	$6.00	$15.00	No min price

Code Execution

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Price per hour

	Price
Per vCPU	$0.0446
Per GiB RAM	$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Price per session

	Price
Session (60 minutes)	$0.03

GPU Cloud

All Together Instant and Reserved Clusters feature:

Choice of Kubernetes or Slurm on Kubernetes
Free network ingress and egress
NVIDIA InfiniBand and NVLink networking

All Together Instant and Reserved Clusters feature: choice of Kubernetes or Slurm on Kubernetes, free network ingress and egress, NVIDIA InfiniBand, and NVLink networking.

Instant Clusters

Ready to use, self-service GPUs.

Price per hour per GPU

Hardware	1 Week - 3 Months	1 - 6 Days	Hourly
NVIDIA HGX H100 SXM	$2.20	$2.50	$2.99
NVIDIA HGX H200	$3.15	$3.45	$3.79
NVIDIA HGX B200	$4.00	$4.50	$5.50

Reserved Clusters

Dedicated capacity, with expert support.

Price per hour

Hardware	GPU Memory	Price
NVIDIA GB200 NVL72	384GB HBM3e	Contact us
NVIDIA B200	192GB HBM3e	Contact us
NVIDIA H200	141GB HBM3e	Starting at $2.09
NVIDIA H100	80GB HBM2e	Starting at $1.75
NVIDIA A100	80GB HBM2e	Starting at $1.30

Frontier AI Factory

Large-scale, custom-built private GPU clusters. 1K → 10K → 100K+ NVIDIA GPUs.

NVIDIA Blackwell GPUs at scale

Talk to our team of experts to get a custom quote for your AI Factory project plan.

Request a project plan

Storage

High-bandwidth, parallel filesystem colocated with your compute.

Item	Price	Unit
Shared Filesystem	$0.16	GiB/month

Interested in a custom large-scale deployment?

Talk to an expert

Serverless Inference

Fine-tuning

Code Execution

GPU Cloud

Interested in a custom large-scale deployment?

Subscribe to newsletter