NVIDIA Nemotron™ 3 Nano Omni is now available on the Together AI platform. Representing a meaningful step forward for multimodal AI, Nemotron 3 Nano Omni is a single, open model that reasons across video, images, audio and language. For developers building agentic applications, Together AI Dedicated Inference is the fastest way to get started and scale.
Why run it on Together AI
Together AI, the AI Native Cloud, has been the platform of choice for developers who want fast, affordable, reliable access to the world's best open models for production-scale inference.
Nemotron 3 Nano Omni unifies context across modalities by executing chained actions often critical for agents that need deterministic behavior. That means an agent can simultaneously reason across audio inputs (e.g., recordings or transcripts), visual inputs such as screenshots, video, and structured documents— without fragmenting that understanding across separate inference passes.
1. Together AI's Research Optimizations Unlock the Model's Full Architectural Potential
The Nemotron 3 Nano Omni hybrid Mamba-Transformer mixture-of-experts (MoE) architecture activates only ~3B parameters per token out of 30B total, and uses multi-token prediction (MTP) to generate multiple future tokens simultaneously in a single forward pass. Together AI's stack is powered by frontier AI systems research, which enables high throughput, cost-efficient, production-grade inference with consistently low latency. Pairing that with the new, highly efficient, highly accurate Nemotron 3 Nano Omni model, means faster multimodal reasoning with more intelligence per unit of compute.
2. Managed Infrastructure Built for Agentic, Production-Scale Inference Workloads
Agent applications depend on predictable performance. Together AI delivers reliable performance under traffic spikes, high uptime, and token streaming, helping agent loops remain responsive even during long-context or continuous decision-making tasks. Developers can quickly deploy Nemotron 3 Nano Omni on Together AI without managing infrastructure, and scale seamlessly from prototype to production. This fully managed environment eliminates operational overhead so teams can focus on building, not maintaining GPUs.
3. Secure, Production-Ready Platform That Protects Your Data
Together AI offers simple, developer-friendly APIs — making it easy to integrate Nemotron 3 Nano Omni into multi-agent frameworks, planning systems, and multi-modal systems. Combined with stable and secure APIs, the platform gives organizations a trustworthy foundation to deploy AI at scale — without trading speed for safety.
Where Nemotron 3 Nano Omni excels
Most production AI systems today handle multimodal inputs through fragmented pipelines: one model for vision, another for audio, another for documents—stitched together with custom orchestration logic. Every seam in that architecture is a potential failure point — added latency, misaligned context, and compounding errors across modalities.
Nemotron Nano Omni eliminates those seams.
Built on a Mixture of Experts (MoE) architecture with Hybrid Transformer-Mamba design, the 30B A3B model supports up to 256K tokens of shared multimodal input context in a single coherent reasoning loop. This allows an agent to understand audio inputs (e.g., transcripts), visual inputs such as screenshots, and relevant documents—without fragmenting that understanding across separate inference passes.
The efficiency benefits are significant:
- Reduced need for multi-model pipelines, lowering system complexity
- More efficient multimodal processing across video, audio, and document workloads
- Improved throughput and scalability for long-context, agentic applications
- Flexible deployment with support for FP8 and NVFP4 across NVIDIA Hopper, NVIDIA Blackwell, and more
And it's fully open. Open weights, open data, open post-training recipes. Developers can deploy anywhere — cloud, on-prem, air-gapped — with full data control and no model lock-in.
What you can build
The unification of perception and reasoning in one model opens up use cases that were previously too complex or too costly to productionize:
Customer service agents can reason across call recordings, screen recordings, and policy documents simultaneously — understanding both user intent and system context.
Financial analyst agents reason across earnings call audio, investor presentation video, scanned chart images, and SEC filings — producing grounded insights rather than surface-level summaries.
Computer use agents see a UI through screen recordings, interpret instructions, and validate actions against constraint documents — all within a single reasoning context.
Any application that previously required assembling a multi-model stack now has a cleaner path to production.
Get started
NVIDIA Nemotron 3 Nano Omni is available on Together AI today.