Solutions / Voice

Deploy real-time voice agents for every use case

Build voice agents that sound natural. Combine the best STT, LLM, and TTS models on co-located infrastructure for ultra-low latency and production-scale reliability.

Abstract composition of overlapping translucent blue and pink curved shapes on light blue gradient background.

Why Together AI for Voice Agents

The complete voice stack, built for real-time production use.

One platform for every voice use case

Deploy fast, expressive, multilingual, or cloned models for any use case. Access MiniMax, Rime, Deepgram, OpenAI, Cartesia through a single API. Swap configurations and switch models without rebuilding integrations.

Ultra-low latency conversations

Sub-second STT-to-TTS latency, built into the infrastructure. The entire pipeline runs co-located, keeping end-to-end latency under 500ms for conversations that feel instant.

Scales without breaking

Autoscale dynamically to thousands of concurrent calls across 25+ global regions. Dedicated GPU endpoints with a 99.9% uptime SLA keep traffic spikes running on pre-warmed capacity, every time.

The complete voice model library

Open-source and proprietary models across the full voice pipeline, on one platform. Switch between models optimized for emotion, pronunciation, code-switching, or cloning — with minimal code changes.

new

Audio

MiniMax Speech 2.8

new

Transcribe

NVIDIA Nemotron 3.5 ASR

Audio

MiniMax Speech 2.6 Turbo

Audio

Cartesia Sonic-3

White stylized letter D on a black circular background.
New

Transcribe

Deepgram Flux

Transcribe

Whisper Large v3

White stylized letter D on a black circular background.

Audio

Deepgram Aura-2

White stylized letter D on a black circular background.

Transcribe

Deepgram Nova-3

White stylized letter D on a black circular background.

Transcribe

Deepgram Nova-3 Multilingual

new

Transcribe

NVIDIA Parakeet TDT 0.6B v3

Audio

Arcana V3 Turbo

Black downward-pointing triangle with thick sides on a white background.

Audio

Orpheus TTS

Audio

Kokoro-82M TTS

Chat

gpt-oss-20B

Have your own model?

Deploy custom containers on Together’s managed GPU infrastructure with automatic scaling, job queues, and built-in observability.

Trusted by teams building voice at scale

Young man with black hair wearing a dark jacket and sunglasses standing near a waterfall.
  • cost reduction

  • <400ms

    p95 model latency

  • Weekly

    model deployments

"Low latency is especially important for voice because there’s a much higher UX bar. Together helped us push latency down by optimizing our models with techniques like speculative decoding, and they’ve been a reliable production partner — proactive about risks and fast when issues come up."

Max Lu

Head of Research, Decagon