Models / Alibaba
Image

Wan 2.6 Image

Professional image editing with instant serverless access on Together AI.

About model

Wan 2.6 Image is Alibaba's 20B parameter image transformation model, built for complex image-to-image workflows, multi-reference style transfer, and precise structural edits. With Together AI, creative teams and developers can leverage Wan 2.6 for high-fidelity, production-ready image editing at scale.

Quickstart guides
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    Wan-AI/Wan2.6-image

    curl -X POST "https://api.together.xyz/v1/images/generations" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Wan-AI/Wan2.6-image",
        "prompt": "Draw an anime style version of this image.",
        "width": 1024,
        "height": 768,
        "steps": 28,
        "n": 1,
        "response_format": "url",
        "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
      }'
    
    from together import Together
    
    client = Together()
    
    imageCompletion = client.images.generate(
        model="Wan-AI/Wan2.6-image",
        width=1024,
        height=768,
        steps=28,
        prompt="Draw an anime style version of this image.",
        image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
    )
    
    print(imageCompletion.data[0].url)
    
    
    
    import Together from "together-ai";
    
    const together = new Together();
    
    async function main() {
      const response = await together.images.create({
        model: "Wan-AI/Wan2.6-image",
        width: 1024,
        height: 1024,
        steps: 28,
        prompt: "Draw an anime style version of this image.",
        image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
      });
    
      console.log(response.data[0].url);
    }
    
    main();
    
    
  • Model card

    Architecture Overview:
    • Multimodal Diffusion Transformer (MMDiT) architecture with 20 billion parameters
    • Supports 1-3 reference images for multi-source composition and style transfer
    • Generates 1-4 output images per request with flexible aspect ratios and HD resolutions
    • Optional LLM-powered prompt expansion for enhanced detail in complex scenes

    Training Methodology:
    • Optimized to preserve identity, textures, and proportions from reference imagery
    • Deep multilingual and cultural understanding, particularly in Asian contexts
    • Trained for advanced spatial reasoning, ensuring elements follow a logical visual structure

    Performance Characteristics:
    • Multi-subject consistency that maintains visual coherence across characters and environments
    • Precise, structure-preserving edits (e.g., changing lighting or materials without breaking the layout)
    • Reliable text rendering for integrating typography naturally into images

  • Applications & use cases

    Marketing & Brand Design:
    • Professional marketing materials requiring text integration and logical layout
    • Brand asset creation with consistent preservation of logos and visual identity
    • Product visualization and mockups with photorealistic lighting and textures
    • Style transfer for campaign creative maintaining brand consistency across variations

    Design & Production:
    • Photo retouching and enhancement with structure-preserving edits
    • Virtual staging and scene composition combining elements from multiple sources
    • Character design and illustration with multi-subject consistency
    • UI element generation and design iteration with precise control

    Creative Workflows:
    • Style exploration and artistic transformations from reference imagery
    • Iterative design refinement with targeted modifications
    • Cultural and regional content creation leveraging multilingual understanding
    • Production-ready asset generation for diverse creative applications

Related models
  • Model provider
    Alibaba
  • Type
    Image
  • Main use cases
    Image-to-Image
  • Deployment
    Serverless
  • Parameters
    20B
  • Price

    $0.03 / image

  • Input modalities
    Text
    Image
  • Output modalities
    Image
  • Released
    January 28, 2026
  • Category
    Image