Model Shaping
Pricing
Serverless Inference
Most teams start with serverless inference and move to dedicated endpoints at scale.
Price per 1M tokens
Batch API price
Model | Input | output |
|---|---|---|
Llama 4 Maverick | $0.27 | $0.85 |
MiniMax M2.5 | $0.30 $0.06 (cached) | $1.20 |
Kimi K2.5 | $0.50 | $2.80 |
GLM-5 | $1.00 | $3.20 |
Llama 3.3 70B | $0.88 | $0.88 |
Llama 3 8B Instruct Lite | $0.10 | $0.10 |
DeepSeek-R1-0528 | $3.00 | $7.00 |
DeepSeek-V3.1 | $0.60 | $1.70 |
gpt-oss-120B | $0.15 | $0.60 |
Qwen3-Next-80B-A3B-Instruct | $0.15 | $1.50 |
Qwen3 235B A22B Instruct 2507 FP8 | $0.20 | $0.60 |
Qwen3 235B A22B Thinking 2507 FP8 | $0.65 | $3.00 |
Qwen2.5 7B Instruct Turbo | $0.30 | $0.30 |
Kimi K2 Instruct | $1.00 | $3.00 |
GLM-4.5-Air | $0.20 | $1.10 |
Kimi K2 Thinking | $1.20 | $4.00 |
Mistral (7B) Instruct v0.2 | $0.20 | $0.20 |
Mistral Small 3 | $0.10 | $0.30 |
Gemma 3n E4B Instruct | $0.02 | $0.04 |
Qwen3.5 9B | $0.10 | $0.15 |
Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.
Price per 1M tokens
Model | Input | output |
|---|
Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.
Price per MP
Model | Input | Images Per $1 (1MP) | Default steps |
|---|---|---|---|
FLUX.1 Krea [dev] | $0.025 | - | 28 |
FLUX.1 Kontext [pro] | $0.04 | - | 28 |
FLUX.1 Kontext [max] | $0.08 | - | 28 |
FLUX1.1 [pro] | $0.04 | - | - |
FLUX.1 [schnell] | $0.0027 | - | 4 |
Google Imagen 4.0 Preview | $0.04 | - | - |
Google Imagen 4.0 Fast | $0.02 | - | - |
Google Imagen 4.0 Ultra | $0.06 | - | - |
ByteDance Seedream 3.0 | $0.018 | - | - |
ByteDance Seedream 4.0 | $0.03 | - | - |
Qwen Image | $0.0058 | - | - |
Juggernaut Pro Flux | $0.0049 | - | - |
Juggernaut Lightning Flux | $0.0017 | - | - |
HiDream-I1-Full | $0.009 | - | - |
HiDream-I1-Dev | $0.0045 | - | - |
HiDream-I1-Fast | $0.0032 | - | - |
Ideogram 3.0 | $0.06 | - | - |
Dreamshaper | $0.0006 | - | - |
SD XL | $0.0019 | - | - |
Stable Diffusion 3 | $0.0019 | - | - |
Wan 2.6 Image | $0.03 | - | - |
GPT Image 1.5 | $0.034 | - |
Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →
Price per 1M Characters
Model | Price |
|---|---|
Cartesia Sonic-2 | $65.00 |
Cartesia Sonic-3 | $65.00 |
Price per video
Model | Price |
|---|---|
MiniMax 01 Director | $0.28 |
MiniMax Hailuo 02 | $0.49 |
Google Veo 2.0 | $2.50 |
Google Veo 3.0 | $1.60 |
Google Veo 3.0 Fast | $0.80 |
PixVerse v5 | $0.30 |
Kling 2.1 Master | $0.92 |
Kling 2.1 Standard | $0.18 |
Kling 2.1 Pro | $0.32 |
Kling 2.0 Master | $0.92 |
Kling 1.6 Standard | $0.19 |
Kling 1.6 Pro | $0.32 |
Wan 2.2 I2V | $0.31 |
Wan 2.2 T2V | $0.66 |
Vidu 2.0 | $0.28 |
Vidu Q1 | $0.22 |
Sora 2 | $0.80 |
Sora 2 Pro | $2.40 |
Price per audio minute
Batch API price
Model | Price |
|---|---|
Whisper Large v3 | $0.0015 |
Price per 1M tokens
Model | Price |
|---|
Price per 1M tokens
Model | Price |
|---|
Price per 1M tokens
Model | Price |
|---|---|
Llama Guard 4 12B | $0.20 |
Dedicated Inference
Deploy models on custom hardware with guaranteed performance and full control.
Single-tenant GPU instances with:
Hardware Type | Price/hour |
|---|---|
1x H100 80GB | $3.99 |
1x H200 141GB | $5.49 |
1x B200 180GB | $9.95 |
GPU Clusters
On-demand
Pay as you go GPU capacity on an hourly basis.
Hardware | Hourly |
|---|---|
NVIDIA HGX H100 | $3.49 |
NVIDIA HGX H200 | $4.19 |
NVIDIA HGX B200 | $7.49 |
Reserved
Reserve GPU capacity for a duration above 6 days.
Hardware | 1 Week - 1 Month | 2 - 3 Months | 4 - 6 Months | 6+ Months |
|---|---|---|---|---|
NVIDIA HGX H100 | $2.69 | $2.39 | $2.25 | |
NVIDIA HGX H200 | $3.19 | $2.79 | $2.59 | |
NVIDIA HGX B200 | $5.49 | $4.79 | $4.49 | |
NVIDIA GB200 NVL72 | ||||
NVIDIA GB300 NVL72 |
Sandbox
Code Sandbox
Customize a deployment of VM sandboxes for large development environments.
Compute costs | Price/Hour |
|---|---|
Per vCPU | $0.0446 |
Per GiB RAM | $0.0149 |
Code Interpreter
Execute LLM-generated code securely using our API.
Duration? | Price/Session |
|---|---|
Session (60 minutes) | $0.03 |
Storage
High-bandwidth, parallel filesystem colocated with your compute.
Compute costs | Price | Unit |
|---|---|---|
Shared Filesystem | $0.16 | GiB/month |
Fine-Tuning
Train open-source models for real production use.
Per 1M tokens
Supervised Fine-Tuning | Direct Preference Optimization | Size | LoRA | Full Fine-Tuning | LoRA | Full Fine-Tuning |
|---|---|---|---|---|
Up to 16B | $0.48 | $0.54 | $1.20 | $1.35 |
17B-69B | $1.50 | $1.65 | $3.75 | $4.12 |
70-100B | $2.90 | $3.20 | $7.25 | $8.00 |
Size | Supervised | Direct Preference | Minimum charge |
|---|---|---|---|
DeepSeek-R1 DeepSeek-R1-0528 DeepSeek-V3 DeepSeek-V3-0324 DeepSeek-V3.1 DeepSeek-V3.1-Base | $10.00 | $25.00 | $20.00 |
GLM-4.6 GLM-4.7 | $9.00 | $22.50 | $27.00 |
gpt-oss-120B | $5.00 | $12.50 | $6.00 |
Kimi K2 Thinking Kimi K2 Instruct-0905 Kimi K2 Instruct Kimi K2 Base | $15.00 | $37.50 | $60.00 |
Llama 4 Maverick Llama 4 Maverick Instruct | $8.00 | $20.00 | $16.00 |
Llama 4 Scout Llama 4 Scout | $3.00 | $7.50 | $6.00 |
Qwen3-Coder-480B-A35B-Instruct | $9.00 | $22.50 | $18.00 |
Qwen3-235B-A22B Qwen3-235B-A22B-Instruct-2507 | $6.00 | $15.00 | No min. price |
Price is based on the sum of tokens processed in the fine-tuning training dataset (training dataset size * number of epochs) plus any tokens in the optional evaluation dataset (validation dataset size * number of evaluations).
Trusted by







