Matt Berman x Together AI

Try DeepSeek-R1 70B Distilled free

Start experimenting with the power of reasoning models today.

Try DeepSeek-R1 70B Distill Free

DeepSeek-R1 on Together AI

Security-first approach
We host all models in our own data centers, with no data sharing back to DeepSeek. Developers retain full control over their data with opt-out privacy settings.
Full R1 model family
While others may only serve distilled models, we provide access to larger R1 model, and the distilled variants, ensuring you can test and deploy the model that best suits your needs.
Serverless infrastructure
Our infrastructure is optimized for large-scale models like DeepSeek-R1, providing the high throughput and low latency necessary for production workloads, with the flexibility of pay-per-token pricing.

Run DeepSeek-R1 securely today

Run the full DeepSeek-R1 securely on Together AI, and only pay per-token pricing

DeepSeek-R1 delivers OpenAI-o1-level performance in math, code, and logic—9x cheaper and fully open-source.

Try the full DeepSeek-R1

Introducing the DeepSeek-R1 Distilled Family

The DeekSeek-R1 family of distilled models are variants of SOTA open source models that have been distilled by DeepSeek-R1 to have reasoning capabilites.

Llama 70B R1 Distilled
Llama 70B distilled with reasoning capabilities from Deepseek R1. Surpasses GPT-4o with 94.5% on MATH-500 & matches o1-mini on coding.
TRY NOW
Qwen 14B R1 Distilled
Qwen 14B distilled with reasoning capabilities from Deepseek R1. Outperforms GPT-4o in math & matches o1-mini on coding.
TRY NOW
Qwen 1.5B R1 Distilled
Small Qwen 1.5B distilled with reasoning capabilities from Deepseek R1. Beats GPT-4o on MATH-500 whilst being a fraction of the size.
TRY NOW

Run DeepSeek-R1 or any model on the fastest endpoints

Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.

We also have dedicated endpoints available for production traffic and enterprise deployments - all with zero rate limits. Servers are available in North America with complete data privacy controls.

Start building Contact us

RUN INFERENCE

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [],
    "max_tokens": 512,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "repetition_penalty": 1,
    "stop": ["<｜end▁of▁sentence｜>"],
    "stream": true
  }'

RUN INFERENCE

from together import Together

client = Together()

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[],
    max_tokens=512,
    temperature=0.7,
    top_p=0.7,
    top_k=50,
    repetition_penalty=1,
    stop=["<｜end▁of▁sentence｜>"],
    stream=True
)
for token in response:
    if hasattr(token, 'choices'):
        print(token.choices[0].delta.content, end='', flush=True)

RUN INFERENCE

import Together from "together-ai";

const together = new Together();

const response = await together.chat.completions.create({
    messages: [],
    model: "deepseek-ai/DeepSeek-R1",
    max_tokens: 512,
    temperature: 0.7,
    top_p: 0.7,
    top_k: 50,
    repetition_penalty: 1,
    stop: ["<｜end▁of▁sentence｜>"],
    stream: true
});

for await (const token of response) {
    console.log(token.choices[0]?.delta?.content)
}

Subscribe to newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Try DeepSeek-R1 70B Distilled free

DeepSeek-R1 on Together AI

Security-first approach

Full R1 model family

Serverless infrastructure

Run the full DeepSeek-R1 securely on Together AI, and only pay per-token pricing

Introducing the DeepSeek-R1 Distilled Family

Run DeepSeek-R1 or any model on the fastest endpoints

Subscribe to newsletter