Model Library

Published 2/4/2026

Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI

High-performance multilingual TTS with native code-switching and real-time latency on dedicated endpoints.

Summary

  • Starting today, two new Rime models are available on Together AI: Rime Arcana V3 Turbo (English–Spanish, performance) and Rime Arcana V3 (11-language switching)
  • Native code-switching that keeps cadence and prosody consistent across language boundaries
  • Rime Arcana V3 Turbo: ~120 ms time-to-first-audio on Together AI dedicated endpoints
  • Co-located with LLM and STT workloads, with one API and unified observability

When a caller code-switches mid-sentence, most voice agents lose what makes them sound native. Cadence slips, the response lands like a translation, and trust drops. Teams patch it by routing between language-specific TTS models, but the handoff adds latency and makes voice behavior inconsistent inside the same conversation. Rime's Arcana V3 line is built for that moment: natural code-switching at production speed without turning multilingual into a routing problem.

Starting today, Together AI, the AI Native Cloud, is adding Rime Arcana V3 Turbo and Rime Arcana V3 to the Together Model Library. V3 Turbo delivers English–Spanish code-switching at ~120 ms time-to-first-audio on dedicated endpoints, with prosody trained on bilingual speech patterns. V3 expands switching across 11 languages from a single model. Both run co-located with your LLM and STT workloads behind the same API, authentication, and observability surface you already use.

hi_thanks_for_calling_customer_support_en_de_fr_ja.wav
0:00
Hi — thanks for calling customer support. I can help you in multiple languages. (English, German, French, Japanese)
Try now

V3 Turbo: Performance for real-time bilingual conversations

~120ms time-to-first-audio

Voice agents need end-to-end latency under 700ms to feel conversational, which means TTS must leave headroom for STT and LLM processing. V3 Turbo hits ~120ms time-to-first-audio on Together AI dedicated endpoints, so when a customer switches from English to Spanish mid-sentence, the agent's bilingual response arrives in stride. Co-locating V3 Turbo with LLM and STT on Together AI keeps the full pipeline (speech recognition through reasoning to synthesis) within that 700ms budget.

English-Spanish code-switching trained on native bilingual speech

Bilingual callers mix languages inside a sentence. V3 Turbo is trained on those patterns, including where pauses land and how stress shifts at the boundary. A customer says, "I need help with my account, es que no puedo acceder." V3 Turbo can respond in the same mixed register, with pauses and emphasis that match how bilingual speakers actually talk.

Efficient concurrency for high-volume deployments

V3 Turbo's performance enables higher concurrency per GPU. For contact centers handling thousands of concurrent calls in bilingual markets, this means fewer GPUs to maintain production latency when customers code-switch, reducing total cost of ownership while preserving conversational quality.

V3: Multilingual breadth with code-switching

~160ms time-to-first-audio across 11 languages

V3 reaches ~160ms p50 time-to-first-audio on Together AI dedicated endpoints while supporting code-switching across 11 languages. This keeps multilingual conversations responsive even as the model handles the complexity of natural transitions between any supported language pair.

11 languages with natural transitions

V3 supports 11 languages and can code-switch between supported languages. A customer starts in French, switches to English for a technical term, then back to French for clarification. V3 handles these transitions while preserving prosody and accent consistency.

Single model for multilingual markets

V3 lets teams consolidate what used to require separate models or vendors per language. Deploy once and serve multilingual customers from a single endpoint without maintaining separate infrastructure per market. When the conversation switches languages, V3 keeps cadence and emphasis natural so the transition does not sound stitched together.


hi_nice_to_meet_you_en_es_fr_de_pt_ar_he_hi_ja_ta.wav
0:00
Hi, nice to meet you! (English, Spanish, French, German, Portuguese, Arabic, Hebrew, Hindi, Japanese, Tamil).
Try now

Use cases

Bilingual metro markets

In bilingual metro markets, customer service calls routinely involve code-switching. Customers start in English, switch to Spanish for culturally specific context, switch back for confirmation. V3 Turbo handles these transitions at ~120ms time-to-first-audio, so customers stay in the automated flow longer instead of requesting transfer to human agents. Together AI dedicated endpoints keep performance consistent even during peak call volume.

Regulated services in bilingual contexts

Banks, healthcare providers, and government services serving bilingual communities need agents that code-switch the way their customers do. A customer calling about a prescription might use English for most of the conversation, but switch to their native language for symptoms or medication names. Natural switching reduces repeats and transfers because callers stop testing the agent's language ability mid-call. Running your full voice stack on Together AI means one compliance review covers LLM, STT, and TTS.

International call centers

Call centers serving multilingual markets handle customers who code-switch across multiple languages in a single call. A business customer in Luxembourg might mix French, German, and English in one conversation. V3 processes these transitions while maintaining flow, and Together AI's unified observability means you can track performance across all languages from a single dashboard.

Production inference on Together AI

Both Rime Arcana V3 models run on Together AI dedicated endpoints with isolated GPU capacity alongside LLM and STT workloads. Together AI offers a broad TTS catalog on a single platform, from open-source models to enterprise-grade proprietary models like Rime, all with unified tooling.

Infrastructure

  • ✔ Dedicated GPU capacity with isolated workloads

  • ✔ 99.9% uptime SLA

  • ✔ SOC 2 Type II, HIPAA ready, PCI compliant

  • ✔ Global data centers

  • ✔ WebSocket streaming support

  • ✔ Zero data retention and full data ownership and control

Developer experience

  • ✔ Same SDKs and authentication as LLM and STT endpoints

  • ✔ Unified pronunciation API across V3 Turbo and V3

  • ✔ Single observability and logging surface for entire voice pipeline

  • ✔ Model selection and swapping via configuration

  • ✔ Professional voice cloning services available

  • ✔ Batch processing for high-volume workflows

Get started

→ Try both models now

→ Read TTS Documentation

Contact Sales for deterministic pronunciation control, dedicated deployment, and volume pricing