Pricing

INFERENCE

Serverless Inference
Dedicated Inference

Compute

GPU Clusters
Sandbox
Managed Storage

Model Shaping

Fine-Tuning

Need help choosing?

Our team can help you find the best fit for your needs.

Pricing

Serverless Inference

Most teams start with serverless inference and move to dedicated endpoints at scale.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Price per 1M tokens

Batch API price

Model	Input	output
MiniMax M2.5	$0.30 $0.06 (cached)	$1.20
Kimi K2.5	$0.50	$2.80
GLM-5	$1.00	$3.20
Llama 3.3 70B	$0.88	$0.88
Llama 3 8B Instruct Lite	$0.10	$0.10
DeepSeek-R1-0528	$3.00	$7.00
DeepSeek-V3.1	$0.60	$1.70
gpt-oss-120B	$0.15	$0.60
Qwen2.5 7B Instruct Turbo	$0.30	$0.30
Kimi K2 Instruct	$1.00	$3.00
Mistral (7B) Instruct v0.2	$0.20	$0.20
Mistral Small 3	$0.10	$0.30
Gemma 3n E4B Instruct	$0.02	$0.04
Qwen3.5 9B	$0.10	$0.15

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Price per 1M tokens

No items found.

Model	Input	output

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Model	Price per mp	Price per iMAGE	Default steps
FLUX.1 Krea [dev]	$0.025	-	28
FLUX.1 Kontext [pro]	$0.04	-	28
FLUX.1 Kontext [max]	$0.08	-	28
FLUX.2 [flex]	-	$0.03	-
FLUX.2 [dev]	-	$0.0154	-
FLUX.2 [pro]	-	$0.03	-
FLUX.2 [max]	$0.070	-	50
FLUX1.1 [pro]	$0.04	-	-
Nano Banana Pro (Gemini 3 Pro Image)	-	$0.134	-
FLUX.1 [schnell]	$0.0027	-	4
Google Imagen 4.0 Preview	$0.04	-	-
Google Imagen 4.0 Fast	$0.02	-	-
Google Imagen 4.0 Ultra	$0.06	-	-
Gemini Flash Image 2.5 (Nano Banana)	-	$0.039	-
ByteDance Seedream 3.0	$0.018	-	-
ByteDance Seedream 4.0	$0.03	-	-
Qwen Image	$0.0058	-	-
Juggernaut Pro Flux	$0.0049	-	-
Juggernaut Lightning Flux	$0.0017	-	-
HiDream-I1-Full	$0.009	-	-
HiDream-I1-Dev	$0.0045	-	-
HiDream-I1-Fast	$0.0032	-	-
Ideogram 3.0	$0.06	-	-
Dreamshaper	$0.0006	-	-
SD XL	$0.0019	-	-
Stable Diffusion 3	$0.0019	-	-
Wan 2.6 Image	-	$0.03	-
GPT Image 1.5	-	$0.034	-

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Price per 1M Characters

Model	Price
Cartesia Sonic-2	$65.00
Cartesia Sonic-3	$65.00
NVIDIA Parakeet TDT 0.6B v3	$0.0015

Price per video

Model	Price
MiniMax 01 Director	$0.28
MiniMax Hailuo 02	$0.49
Google Veo 2.0	$2.50
Google Veo 3.0	$1.60
Google Veo 3.0 + Audio	$3.20
Google Veo 3.0 Fast	$0.80
Google Veo 3.0 Fast + Audio	$1.20
ByteDance Seedance 1.0 Lite	$0.14
ByteDance Seedance 1.0 Pro	$0.57
PixVerse v5	$0.30
Kling 2.1 Master	$0.92
Kling 2.1 Standard	$0.18
Kling 2.1 Pro	$0.32
Kling 2.0 Master	$0.92
Kling 1.6 Standard	$0.19
Kling 1.6 Pro	$0.32
Wan 2.2 I2V	$0.31
Wan 2.2 T2V	$0.66
Vidu 2.0	$0.28
Vidu Q1	$0.22
Sora 2	$0.80

Price per audio minute

Batch API price

Model	Price
Whisper Large v3	$0.0015

Price per 1M tokens

Model	Price
Multilingual e5 large instruct	$0.02

Price per 1M tokens

No items found.

Model	Price

Price per 1M tokens

Model	Price
VirtueGuard Text Lite	$0.20
Llama Guard 4 12B	$0.20

Dedicated Inference

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

Guaranteed performance (no sharing)
Support for custom models
Autoscaling & traffic spike handling

Hardware Type	Price/hour
1x H100 80GB	$3.99
1x H200 141GB	$5.49
1x B200 180GB	$9.95

GPU Clusters

On-demand

Pay as you go GPU capacity on an hourly basis.

Hardware	Hourly
NVIDIA HGX H100	$3.49
NVIDIA HGX H200	$4.19
NVIDIA HGX B200	$7.49

Reserved

Reserve GPU capacity for a duration above 6 days.

Hardware	1 Week - 1 Month	2 - 3 Months	4 - 6 Months	6+ Months
NVIDIA HGX H100	$2.99	$2.69	$2.55	Contact us
NVIDIA HGX H200	$3.49	$3.19	$2.89	Contact us
NVIDIA HGX B200	$7.15	$6.75	$6.39	Contact us
NVIDIA GB200 NVL72	Contact us	Contact us	Contact us	Contact us
NVIDIA GB300 NVL72	Contact us	Contact us	Contact us	Contact us

Sandbox

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Compute costs	Price/Hour
Per vCPU	$0.0446
Per GiB RAM	$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Duration?	Price/Session
Session (60 minutes)	$0.03

Storage

High-bandwidth, parallel filesystem colocated with your compute.

Compute costs	Price	Unit
Shared Filesystem	$0.16	GiB/month

Fine-Tuning

Train open-source models for real production use.

Per 1M tokens

	Supervised Fine-Tuning		Direct Preference Optimization
Size	LoRA	Full Fine-Tuning	LoRA	Full Fine-Tuning
Up to 16B	$0.48	$0.54	$1.20	$1.35
17B-69B	$1.50	$1.65	$3.75	$4.12
70-100B	$2.90	$3.20	$7.25	$8.00

Size	Supervised Fine-Tuning (LoRA)	Direct Preference Optimization (LoRA)	Minimum charge
DeepSeek-R1 DeepSeek-R1-0528 DeepSeek-V3 DeepSeek-V3-0324 DeepSeek-V3.1 DeepSeek-V3.1-Base	$10.00	$25.00	$20.00
GLM-4.6 GLM-4.7	$9.00	$22.50	$27.00
gpt-oss-120B	$5.00	$12.50	$6.00
Kimi K2 Thinking Kimi K2 Instruct-0905 Kimi K2 Instruct Kimi K2 Base	$15.00	$37.50	$60.00
Llama 4 Maverick Llama 4 Maverick Instruct	$8.00	$20.00	$16.00
Llama 4 Scout Llama 4 Scout	$3.00	$7.50	$6.00
Qwen3-Coder-480B-A35B-Instruct	$9.00	$22.50	$18.00
Qwen3-235B-A22B Qwen3-235B-A22B-Instruct-2507	$6.00	$15.00	No min. price
Qwen3.5-122B-A10B	$6.00	$15.00	$10.00
Qwen3.5-397B-A17B	$8.00	$20.00	$22.00

Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).

Serverless Inference

Dedicated Inference

GPU Clusters

Sandbox

Storage

Fine-Tuning

Trusted by

Start building on Together AI