Models / OpenAI
Transcribe

Whisper Large v3

State-of-the-art automatic speech recognition and translation model supporting 99 languages with 10-20% error reduction over previous versions.

About model

Whisper Large v3 is a state-of-the-art automatic speech recognition model trained on over 5 million hours of labeled data. It demonstrates strong generalization to various datasets and domains in a zero-shot setting, with improved performance across multiple languages. Suitable for applications requiring accurate speech recognition and translation.

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    openai/whisper-large-v3

    curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -F "model=openai/whisper-large-v3" \
      -F "language=en" \
      -F "response_format=json" \
      -F "timestamp_granularities=segment"
    
    from together import Together
    
    client = Together()
    response = client.audio.transcribe(
        model="openai/whisper-large-v3",
        language="en",
        response_format="json",
        timestamp_granularities="segment"
    )
    print(response.text)
    
    import Together from "together-ai";
    
    const together = new Together();
    
    const response = await together.audio.transcriptions.create(
      model: "openai/whisper-large-v3",
      language: "en",
      response_format: "json",
      timestamp_granularities: "segment"
    });
    console.log(response)
    
  • Model card

    Performance Architecture:
    • Whisper V3 Large deployment delivering transcription 15x faster than OpenAI
    • Smart voice activity detection using Silero for precise audio segmentation
    • Intelligent chunking and batching strategies optimized for longer audio files
    • Advanced GPU utilization optimizations maximizing processing efficiency
    • Sub-second processing speeds with dedicated endpoint infrastructure

    Technical Capabilities:
    • Enterprise-scale file handling supporting files exceeding 1GB vs competitors' 25MB limits
    • Superior word-level alignment delivering highest quality timestamps available
    • Comprehensive language support across 50+ languages with automatic detection
    • Seamless processing of 30+ minute audio without complex chunking workflows
    • Batch processing capabilities for large async workloads with consistent performance

    Infrastructure Design:
    • Production-ready API design built for real deployment scenarios
    • Reserved GPU capacity for guaranteed processing speeds
    • Cost-effective pricing at $0.015 per audio minute for high-volume applications
    • Compatible with existing Whisper integrations for minimal migration effort
    • Serverless and dedicated endpoint options for different performance requirements

  • Applications & use cases

    High-Speed Processing Applications:
    • Customer support call analysis with rapid post-call insights
    • Meeting transcription delivered quickly after recording completion
    • Medical transcription services with efficient workflow processing
    • Content transcription for accessibility and media creation

    Enterprise Solutions:
    • High-volume call center transcription and analysis workflows
    • Educational platforms with voice-enabled learning and assessment tools
    • Compliance and quality assurance audio documentation for regulated industries
    • Large-scale content processing for media and entertainment companies
    • Corporate training and onboarding with automated audio transcription

    Voice-Enabled Applications:
    • Conversational AI systems requiring accurate speech input processing
    • Voice-controlled interfaces for accessibility and hands-free operation
    • Multilingual communication platforms with translation capabilities
    • Content creation tools for podcasts, videos, and audio content
    • Voice analytics and sentiment analysis for customer experience optimization

    Developer Integration:
    • Foundation layer for voice AI application development
    • Building block for voice-enabled customer support automation
    • Integration component for educational technology platforms
    • Core infrastructure for voice assistant applications
    • API component for adding speech capabilities to existing applications

Related models
  • Model provider
    OpenAI
  • Type
    Transcribe
  • Main use cases
    Speech-to-Text
  • Deployment
    Serverless
    On-Demand Dedicated
    Monthly Reserved
  • Parameters
    1.55B
  • Price

    $0.0015 / min

  • Input modalities
    Audio
  • Output modalities
    Text
  • Released
    November 7, 2023
  • Last updated
    July 9, 2025
  • External link
  • Category
    Transcribe