This website uses cookies to anonymously analyze website traffic using Google Analytics.

DeepSeek-R1 on Together AI

  • Security-first approach

    We host all models in our own data centers, with no data sharing back to DeepSeek. Developers retain full control over their data with opt-out privacy settings.

  • Full R1 model family

    While others may only serve distilled models, we provide access to larger R1 model, and the distilled variants, ensuring you can test and deploy the model that best suits your needs.

  • Serverless infrastructure

    Our infrastructure is optimized for large-scale models like DeepSeek-R1, providing the high throughput and low latency necessary for production workloads, with the flexibility of pay-per-token pricing.

Run the full DeepSeek-R1 securely on Together AI, and only pay per-token pricing

DeepSeek-R1 delivers OpenAI-o1-level performance in math, code, and logic—9x cheaper and fully open-source.

Try the full DeepSeek-R1

Introducing the DeepSeek-R1 Distilled Family

The DeekSeek-R1 family of distilled models are variants of SOTA open source models that have been distilled by DeepSeek-R1 to have reasoning capabilites.

  • Llama 70B R1 Distilled

    Llama 70B distilled with reasoning capabilities from Deepseek R1. Surpasses GPT-4o with 94.5% on MATH-500 & matches o1-mini on coding.

  • Qwen 14B R1 Distilled

    Qwen 14B distilled with reasoning capabilities from Deepseek R1. Outperforms GPT-4o in math & matches o1-mini on coding.

  • Qwen 1.5B R1 Distilled

    Small Qwen 1.5B distilled with reasoning capabilities from Deepseek R1. Beats GPT-4o on MATH-500 whilst being a fraction of the size.

Run DeepSeek-R1 or any model on the fastest endpoints

Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.

We also have dedicated endpoints available for production traffic and enterprise deployments - all with zero rate limits. Servers are available in North America with complete data privacy controls.

RUN INFERENCE

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [],
    "max_tokens": 512,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "repetition_penalty": 1,
    "stop": ["<|end▁of▁sentence|>"],
    "stream": true
  }'

RUN INFERENCE

from together import Together

client = Together()

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[],
    max_tokens=512,
    temperature=0.7,
    top_p=0.7,
    top_k=50,
    repetition_penalty=1,
    stop=["<|end▁of▁sentence|>"],
    stream=True
)
for token in response:
    if hasattr(token, 'choices'):
        print(token.choices[0].delta.content, end='', flush=True)

RUN INFERENCE

import Together from "together-ai";

const together = new Together();

const response = await together.chat.completions.create({
    messages: [],
    model: "deepseek-ai/DeepSeek-R1",
    max_tokens: 512,
    temperature: 0.7,
    top_p: 0.7,
    top_k: 50,
    repetition_penalty: 1,
    stop: ["<|end▁of▁sentence|>"],
    stream: true
});

for await (const token of response) {
    console.log(token.choices[0]?.delta?.content)
}