This website uses cookies to anonymously analyze website traffic using Google Analytics.

together.pricing

Pricing that scales from idea to production

Build

Get started with fast inference, reliability, and no daily rate limits

Get started

Includes:

Free Llama Vision 11B + FLUX.1 [schnell]

$1 credit for all other models

Fully pay as you go, and easily add credits

No daily rate limits, up to 6000 requests and 2M tokens per minute for LLMs

Deploy on-demand dedicated endpoints (no rate limits)

Monitoring dashboard with 24-hr data

Email and in-app chat support 

sCALE

Scale production traffic, with reserved GPUs, and advanced config

Contact sales

Includes everything in Build plus

Up to 9,000 requests per minute and 5M tokens per minute for LLMs

Premium support

Support via private slack channel

Monitoring dashboard with 30-day data (coming soon!)

Discounts on monthly reserved dedicated GPU

Advanced dedicated endpoint configuration

99% availability dedicated endpoints SLA

HIPAA compliance

eNTERPRISE

Private deployments and model optimization at scale

Contact sales

Includes everything in scale plus

Custom rate limits and no token limits

VPC deployment

Enterprise grade security & compliance

Monitoring dashboard with 1 year data (coming soon!)

Continuous model optimization

Dedicated success representative

99.9% dedicated endpoints SLA with geo redundancy

Priority access to hardware including H100 & H200 GPUs

Custom regions

Inference pricing

Over 100 leading open-source Chat, Multimodal, Language, Image, Code, and Embedding models are available through the Together Inference API. For these models you pay just for what you use.

Serverless Endpoints


Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.

  • Llama 3.2, LLAMA 3.1, LLama 3 MODELS

  • Qwen models

    • Model

      price 1M tokens

    • Qwen 2 72B

      price 1M tokens

      $0.90

    • Qwen 2.5 7B

      price 1M tokens

      $0.30

    • Qwen 2.5 14B

      price 1M tokens

      $0.80

    • Qwen 2.5 72B

      price 1M tokens

      $1.20

    • Qwen 2.5 Coder 32B

      price 1M tokens

      $0.80

    • Qwen QwQ 32B Preview

      price 1M tokens

      $1.20

  • ALL OTHER CHat, language, code and moderation models

    • Model size

      price 1M tokens

    • Up to 4B

      price 1M tokens

      $0.10

    • 4.1B - 8B

      price 1M tokens

      $0.20

    • 8.1B - 21B

      price 1M tokens

      $0.30

    • 21.1B - 41B

      price 1M tokens

      $0.80

    • 41.1B - 80B

      price 1M tokens

      $0.90

    • 80.1B - 110B

      price 1M tokens

      $1.80

  • Mixture-of-experts

    • Model size

      price 1M tokens

    • Up to 56B total parameters

      price 1M tokens

      $0.60

    • 56.1B - 176B total parameters

      price 1M tokens

      $1.20

    • 176.1B - 480B total parameters

      price 1M tokens

      $2.40

  • FLUX Image models

  • STABILITY IMAGE MODELS

    • Image Size

      25 steps

      50 steps

      75 steps

      100 steps

    • 512X512

      25 steps

      $0.001

      50 steps

      $0.002

      75 steps

      $0.0035

      100 steps

      $0.005

    • 1024X1024

      25 steps

      $0.01

      50 steps

      $0.02

      75 steps

      $0.035

      100 steps

      $0.05

  • EMbeddings models

    • Model size

      price 1M tokens

    • Up to 150M

      price 1M tokens

      $0.008

    • 151M - 350M

      price 1M tokens

      $0.016

  • rerank models

    • Model size

      price 1M tokens

    • 8B

      price 1M tokens

      $0.10

Dedicated endpoints

When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Together Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.

  • Your fine-tuned models

    • hardware type

      price per MINUTE HOSTed

    • 1x RTX-6000 48GB

      $0.034

    • 1x L40 48GB

      $0.034

    • 1x L40S 48GB

      $0.048

    • 1x A100 PCIe 80GB

      $0.050

    • 1x A100 SXM 40GB

      $0.050

    • 1x A100 SXM 80GB

      $0.054

    • 1x H100 80GB

      $0.098

Interested in a dedicated endpoint for your own model?

Fine-tuning pricing

Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.

  • Download checkpoints and final model weights.

  • View job status and logs through CLI or Playgrounds.

  • Deploy a model instantly once it’s fine-tuned.

Try the interactive calculator

Together GPU Clusters Pricing

Together Compute provides private, state of the art clusters with H100, H200, and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.

  • haRDWARE TYPES AVAILABLE

    NETWORKING

    pricing

  • A100 PCIe 80GB

    price 1k tokens

    200 Gbps non-blocking Ethernet

    price 1k tokens

    Starting at $1.30/hr

  • A100 SXM 80GB

    price 1k tokens

    200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available

    price 1k tokens

    Starting at $1.30/hr

  • H100 80GB

    price 1k tokens

    3.2 Tbps Infiniband

    price 1k tokens

    Starting at $1.75/hr

  • H200 141GB

    price 1k tokens

    3.2 Tbps Infiniband

    price 1k tokens

    Starting at $2.09/hr