Announcing native availability of NVIDIA Nemotron 3 Nano, NVIDIA’s latest reasoning model

Summary

NVIDIA Nemotron 3 Nano, the company’s newest reasoning model, is now available on Together AI, the AI Native Cloud — combining big-model intelligence with small-model efficiency for agentic systems.
Key specs: Hybrid Mamba-Transformer + sparse MoE architecture with ~3B active parameters for fast, high-quality reasoning; fully open weights, data, and training recipes
Optimized on Together AI for high throughput and cost-efficiency
Ideal for specialized tasks, coding assistants, scientific agents, tool-using planners, enterprise context applications, and evaluation/judge models

Agentic and multi-agent systems are rapidly expanding, driving new demand for fast, consistent reasoning models that support many steps, long context, and continuous decision-making. NVIDIA Nemotron 3 Nano on Together AI provides scalable, high-quality reasoning at production speed — empowering AI engineers to build more capable, cost-efficient agentic systems.

Nemotron 3 Nano

Hybrid Mamba–Transformer + sparse MoE architecture

Nemotron 3 Nano uses a hybrid architecture that enables strong reasoning performance, without losing inference efficiency:

Mamba layers help handle long-range dependencies and structured tasks efficiently
Transformer layers provide strong general-purpose reasoning and instruction following
Sparse Mixture-of-Experts activates only ~3B out of 30B parameters per token, improving speed and cost

This architecture makes Nemotron 3 Nano smart enough for complex reasoning, yet fast enough to reduce cost for multi-agent systems.

With a 1M-token context, Nemotron 3 Nano can support long-horizon planning, RAG-heavy pipelines, document and log-scale workloads, and persistent agent memory across sessions.

It includes open weights, open training data, and open training recipes. This makes it suitable across research, enterprise use, and compliant deployments.

Nemotron 3 Nano demonstrates strong performance in coding, math, scientific reasoning, and function calling. Read NVIDIA Nemotron 3 Nano announcement.

NVIDIA Nemotron 3 Nano on Together AI

Together AI is designed for production-scale reasoning and agentic workloads — making it the ideal platform for deploying Nemotron 3 Nano. With a focus on scale, reliability, cost efficiency, and simple APIs, Together AI makes running the model at its full potential easy.

Performance: Together AI delivers production-grade inference with consistently low latency and high throughput, helping Nemotron 3 Nano support fast, multi-step reasoning loops without bottlenecks. Together AI also scales seamlessly across parallel agentic workloads for multi-agent orchestration and tool-use pipelines.
Reliability: Agent applications depend on predictable performance. Together AI delivers reliable performance under traffic spikes, high uptime, and token streaming, helping agent loops remain responsive even during long-context or continuous decision-making tasks.
Cost efficiency: The 3B active parameters per token in Nemotron 3 Nano allow it to run extremely efficiently — and Together AI amplifies that advantage. Engineers benefit from a lower cost-per-agent step, allowing large-scale agent deployments and frequent reasoning loops — without prohibitive inference costs.
Flexibility: Together AI offers simple, developer-friendly APIs — including an OpenAI-compatible interface — allowing teams to adopt Nemotron 3 Nano with minimal code changes. The platform integrates cleanly into multi-agent frameworks, planning systems, and tool-use workflows for frictionless deployment.

“Nemotron 3 Nano brings leading accuracy and efficiency to the open model ecosystem, empowering developers to build specialized agentic AI with unprecedented transparency. By making this model open and available on the Together AI platform, we’re enabling teams to achieve scalable performance and unlock new opportunities across every industry.” — Joey Conway, Senior Director of Generative AI Software, NVIDIA

Use cases

Nemotron 3 Nano is well suited for reasoning-intensive applications across the Together AI ecosystem, including coding assistants & developer tools to build scientific reasoning agents, multi-step tool use & planning agents, and long-context enterprise assistants.

Try Nemotron 3 Nano

Get started with Nemotron 3 Nano on Together AI, and join the community on Discord.