Models / Chat / Llama 3.1 Nemotron 70B Instruct
Llama 3.1 Nemotron 70B Instruct
LLM
Custom NVIDIA LLM optimized to enhance the helpfulness and relevance of generated responses to user queries.
Try our Llama 3.1 API

API Usage
Endpoint
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
RUN INFERENCE
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
"messages": [{"role": "user", "content": "What are some fun things to do in New York?"}]
}'
JSON RESPONSE
RUN INFERENCE
from together import Together
client = Together()
response = client.chat.completions.create(
model="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
messages=[{"role": "user", "content": "What are some fun things to do in New York?"}],
)
print(response.choices[0].message.content)
JSON RESPONSE
RUN INFERENCE
import Together from "together-ai";
const together = new Together();
const response = await together.chat.completions.create({
messages: [{"role": "user", "content": "What are some fun things to do in New York?"}],
model: "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
});
console.log(response.choices[0].message.content)
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Nemotron
Parameters:
70B
Deployment:
✔ Serverless
Quantization
FP16
Context length:
128K
Pricing:
$0.88
Run in playground
Deploy model
Quickstart docs
Looking for production scale? Deploy on a dedicated endpoint
Deploy Llama 3.1 Nemotron 70B Instruct on a dedicated endpoint with custom hardware configuration, as many instances as you need, and auto-scaling.
