Model Library

Published 10/21/2025

Expanding Together AI Model Library into multimedia generation with 40+ new image and video models

Build complete multimodal applications with video, image, and text generation through unified APIs.

What's New

  • New video generation API with models like OpenAI Sora 2, Google Veo 3.0, and Minimax Hailuo for high-quality video creation
  • 40+ new image and video models, including Google's Imagen and Nano Banana, ByteDance SeeDream, and specialized editing tools
  • Complete workflows - Combine text, image, and video generation in single applications without switching providers
  • Same APIs you know - OpenAI-compatible endpoints, unified auth, transparent per-model pricing
  • Available now: Serverless endpoints with enterprise options for scale

Generative media is at the center of a new set of AI-native applications, from AI-powered video editors and personalized gaming experiences to automated marketing content. But building these apps has been complex, with developers having to juggle providers for text, images, and video—each with new SDKs, auth, rate limits, and billing. That fragmentation slows teams, complicates SLAs, and makes scaling a headache.

Today Together AI, the AI Native Cloud, is expanding the Together Model Libary to become your complete generative media infrastructure. Through our strategic partnership with Runware, we're integrating 20+ video models across six providers (including Google Veo 3.0, OpenAI Sora 2, and ByteDance Seedream) plus 15+ image models alongside leading LLMs and voice—spanning the quality-speed-cost spectrum that real applications demand, all accessible through the same fast, reliable APIs you use for text generation.

40+ Models Chosen for Production Workflows

New Video Generation Models

Video generation is new to Together AI. We're starting with models that create 4-30 second videos at various resolutions and styles. Each model optimizes for different needs - realism, motion consistency, or extended length. From quick 10-second clips with Minimax Hailuo to extended 30-second sequences with Kling v2.1, and specialized motion generation with SeeDance. This variety ensures developers can choose the right tool for their specific video generation requirements, from rapid prototyping to production-quality content creation.

Sora 2 Pro

8s

Premium cinematic video generation with native audio and lifelike physics.

$2.40/video (720p/8s)

Try now

Google Veo 3

8s

High-quality video creation with advanced camera movements and scene control.

$1.60/video (720p/8s)

Try now

PixVerse V5

5s

Fast, affordable video generation with smooth motion and multiple artistic styles.

$0.30/video (1080p/5s)

Try now

ByteDance Seedance 1.0 Pro

5s

Top-ranked video generation with multi-shot storytelling and cinematic quality.

$0.57/video (1080p/5s)

Try now

New Image Generation & Editing Models

Together AI's image generation capabilities span the full spectrum of creative and production needs. From photorealistic generation with Google's Imagen to artistic control with models like Nano Banana, developers get access to specialized tools optimized for different use cases without researching individual providers or managing separate integrations.

Gemini Flash Image 2.5 (Nano Banana)

Versatile image creation and editing with natural language control.

$0.039/image

Try now

Google Imagen 4.0 Ultra

Premium image generation with exceptional detail and text rendering.

$0.06/image

Try now

Qwen Image

High-quality image generation with perfect text integration and poster design.

$0.0058/image

Try now

34+ More Models

Complete range of specialized models for every creative and production use case.

From $0.0006/image

Browse all

Build Complete Workflows in One Platform

Combine text, image, and video generation in a single codebase without managing multiple providers. Your existing Together integration gains image editing, creative generation, and video production capabilities.

Here are three types of applications this makes practical to build:

🎮 Media Generation in Gaming

Technical capability: Gaming studios generating environmental assets, character variations, and cutscenes programmatically based on gameplay data.

Platform advantage: Single API call chain from game state to visual assets, enabling real-time content generation without managing multiple inference providers.

🛍️ Dynamic Advertising Creative

Technical capability: E-commerce platforms generating personalized product images, lifestyle shots, and video ads based on user preferences, seasonal trends, and inventory data.

Platform advantage: Real-time creative generation from user data to personalized visuals, enabling dynamic ad optimization without coordinating separate image and video providers.

🧠 Interactive Learning Platforms

Technical capability: Educational applications creating custom visual explanations, interactive diagrams, and personalized video content based on student questions and progress.

Platform advantage: Real-time multimodal responses using the same inference infrastructure, enabling sophisticated personalization without latency penalties from provider switching.

Production Deployment Options

Together AI's generative media capabilities are production-ready with enterprise-grade infrastructure and developer-focused tools.

Performance & Scale

  • ✔ 40+ image and video models
  • ✔ Up to 30-second video generation
  • ✔ Multiple resolution options
  • ✔ Transparent per-model pricing

Infrastructure

  • ✔ Production-grade rate limits
  • ✔ Serverless auto-scaling
  • ✔ Global infrastructure
  • ✔ Enterprise reliability

Developer Experience

  • ✔ OpenAI-compatible APIs
  • ✔ Same SDK as text models
  • ✔ Unified authentication
  • ✔ Single billing platform

Try it Now

If you're already using Together AI for text inference, adding image and video generation works the same way. Same authentication, same SDKs, same billing dashboard. All usage shows up in one place with transparent per-model pricing.

    
    import time
    from together import Together

    client = Together()

    # Create a video generation job
    job = client.videos.create(
        prompt="A serene sunset over the ocean with gentle waves",
        model="minimax/video-01-director",
        width=1366,
        height=768,
    )

    print(f"Job ID: {job.id}")

    # Poll until completion
    while True:
        status = client.videos.retrieve(job.id)
        print(f"Status: {status.status}")

        if status.status == "completed":
            print(f"Video URL: {status.outputs.video_url}")
            break
        elif status.status == "failed":
            print("Video generation failed")
            break

        # Wait before checking again
        time.sleep(5)
    

Try the platform:

{{custom-cta-1}}

Deploy for production:

  • Start with serverless endpoints for development and testing
  • For enterprise deployments and maximum control, contact our Sales team.

The same Together AI platform you use for text inference now handles your complete generative AI stack. No additional integrations, no vendor management overhead, no learning new APIs - just expanded capabilities in the same developer experience you already know.

Ready to dive in?

Follow our step-by-step Quickstart to install, authenticate, and run your first video inference in minutes.