Need help choosing?

Our team can help you find the best fit for your needs.

Pricing

Pricing

Serverless Inference

Most teams start with serverless inference and move to dedicated endpoints at scale.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Price per 1M tokens

Batch API price

Model

Input

output

Llama 4 Maverick

$0.27

$0.85

MiniMax M2.5

$0.30

$0.06 (cached)

$1.20

Kimi K2.5

$0.50

$2.80

GLM-5

$1.00

$3.20

Llama 3.3 70B

$0.88

$0.88

Llama 3 8B Instruct Lite

$0.10

$0.10

DeepSeek-R1-0528

$3.00

$7.00

DeepSeek-V3.1

$0.60

$1.70

gpt-oss-120B

$0.15

$0.60

Qwen3-Next-80B-A3B-Instruct

$0.15

$1.50

Qwen3 235B A22B Instruct 2507 FP8

$0.20

$0.60

Qwen3 235B A22B Thinking 2507 FP8

$0.65

$3.00

Qwen2.5 7B Instruct Turbo

$0.30

$0.30

Kimi K2 Instruct

$1.00

$3.00

GLM-4.5-Air

$0.20

$1.10

Kimi K2 Thinking

$1.20

$4.00

Mistral (7B) Instruct v0.2

$0.20

$0.20

Mistral Small 3

$0.10

$0.30

Gemma 3n E4B Instruct

$0.02

$0.04

Qwen3.5 9B

$0.10

$0.15

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Price per 1M tokens

No items found.

Model

Input

output

Displayed prices refer to the lowest resolution/duration settings. Actual prices might vary.

Price per MP

Model

Input

Images Per $1 (1MP)

Default steps

FLUX.1 Krea [dev]

$0.025

-

28

FLUX.1 Kontext [pro]

$0.04

-

28

FLUX.1 Kontext [max]

$0.08

-

28

FLUX1.1 [pro]

$0.04

-

-

FLUX.1 [schnell]

$0.0027

-

4

Google Imagen 4.0 Preview

$0.04

-

-

Google Imagen 4.0 Fast

$0.02

-

-

Google Imagen 4.0 Ultra

$0.06

-

-

Gemini Flash Image 2.5 (Nano Banana)

$0.039

-

-

ByteDance Seedream 3.0

$0.018

-

-

ByteDance Seedream 4.0

$0.03

-

-

Qwen Image

$0.0058

-

-

Juggernaut Pro Flux

$0.0049

-

-

Juggernaut Lightning Flux

$0.0017

-

-

HiDream-I1-Full

$0.009

-

-

HiDream-I1-Dev

$0.0045

-

-

HiDream-I1-Fast

$0.0032

-

-

Ideogram 3.0

$0.06

-

-

Dreamshaper

$0.0006

-

-

SD XL

$0.0019

-

-

Stable Diffusion 3

$0.0019

-

-

Wan 2.6 Image

$0.03

-

-

GPT Image 1.5

$0.034

-

Prices include default steps shown above. Additional costs apply only when exceeding default steps. See full pricing details →

Price per 1M Characters

Model

Price

Cartesia Sonic-2

$65.00

Cartesia Sonic-3

$65.00

Price per audio minute

Batch API price

Model

Price

Whisper Large v3

$0.0015

Price per 1M tokens

Model

Price

Multilingual e5 large instruct

$0.02

Price per 1M tokens

Model

Price

Mxbai Rerank Large V2

$0.10

Price per 1M tokens

Model

Price

VirtueGuard Text Lite

$0.20

Llama Guard 4 12B

$0.20

Dedicated Inference

Deploy models on custom hardware with guaranteed performance and full control.

Single-tenant GPU instances with:

  • Guaranteed performance (no sharing)

  • Support for custom models

  • Autoscaling & traffic spike handling

Hardware Type

Price/hour

1x H100 80GB

$3.99

1x H200 141GB

$5.49

1x B200 180GB

$9.95

GPU Clusters

On-demand

Pay as you go GPU capacity on an hourly basis.

Hardware

Hourly

NVIDIA HGX H100

$3.49

NVIDIA HGX H200

$4.19

NVIDIA HGX B200

$7.49

Reserved

Reserve GPU capacity for a duration above 6 days.

Hardware

1 Week - 1 Month

2 - 3 Months

4 - 6 Months

6+ Months

NVIDIA HGX H100

$2.69

$2.39

$2.25

NVIDIA HGX H200

$3.19

$2.79

$2.59

NVIDIA HGX B200

$5.49

$4.79

$4.49

NVIDIA GB200 NVL72

NVIDIA GB300 NVL72

Sandbox

Code Sandbox

Customize a deployment of VM sandboxes for large development environments.

Compute costs

Price/Hour

Per vCPU

$0.0446

Per GiB RAM

$0.0149

Code Interpreter

Execute LLM-generated code securely using our API.

Duration?

Price/Session

Session (60 minutes)

$0.03

Storage

High-bandwidth, parallel filesystem colocated with your compute.

Compute costs

Price

Unit

Shared Filesystem

$0.16

GiB/month

Fine-Tuning

Train open-source models for
real production use.

Per 1M tokens

Supervised Fine-Tuning

Direct Preference Optimization

Size

LoRA

Full Fine-Tuning

LoRA

Full Fine-Tuning

Up to 16B

$0.48

$0.54

$1.20

$1.35

17B-69B

$1.50

$1.65

$3.75

$4.12

70-100B

$2.90

$3.20

$7.25

$8.00

Size

Supervised
Fine-Tuning (LoRA)

Direct Preference
Optimization (LoRA)

Minimum charge

DeepSeek-R1

DeepSeek-R1-0528

DeepSeek-V3

DeepSeek-V3-0324

DeepSeek-V3.1

DeepSeek-V3.1-Base

$10.00

$25.00

$20.00

GLM-4.6

GLM-4.7

$9.00

$22.50

$27.00

gpt-oss-120B

$5.00

$12.50

$6.00

Kimi K2 Thinking

Kimi K2 Instruct-0905

Kimi K2 Instruct

Kimi K2 Base

$15.00

$37.50

$60.00

Llama 4 Maverick

Llama 4 Maverick Instruct

$8.00

$20.00

$16.00

Llama 4 Scout

Llama 4 Scout

$3.00

$7.50

$6.00

Qwen3-Coder-480B-A35B-Instruct

$9.00

$22.50

$18.00

Qwen3-235B-A22B

Qwen3-235B-A22B-Instruct-2507

$6.00

$15.00

No min. price

Price is based on the sum of tokens processed in the  fine-tuning training dataset (training dataset size * number of epochs)  plus any tokens in the optional evaluation dataset (validation dataset  size * number of evaluations).

Trusted by