This website uses cookies to anonymously analyze website traffic using Google Analytics.

36K GPUs NVIDIA GB200 NVL72, coming in Q1 2025. Request your cluster

No search results.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Run any model on the fastest endpoints

Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.

Scale into a dedicated deployment anytime with a custom number of instances to get optimal throughput.

RUN INFERENCE

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
	"model": "meta-llama/Llama-Vision-Free",
	"messages": [{"role": "user", "content": "What are some fun things to do in New York?"}]
  }'

RUN INFERENCE

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Llama-Vision-Free",
    messages=[{"role": "user", "content": "What are some fun things to do in New York?"}],
)

print(response.choices[0].message.content)

RUN INFERENCE

import Together from "together-ai";

const together = new Together({ apiKey: process.env.TOGETHER_API_KEY });

const response = await together.chat.completions.create({
    messages: [{"role": "user", "content": "What are some fun things to do in New York?"}],
    model: "meta-llama/Llama-Vision-Free",
});

console.log(response.choices[0].message.content)