Models / Moonshot AI
Chat
Code
LLM

Kimi K2.5

State-of-the-art multimodal thinking agent with vision and Agent Swarm

About model

Kimi K2.5 is Moonshot AI's most capable open-source thinking model, built as a thinking agent that reasons step-by-step while dynamically invoking tools. Setting new state-of-the-art records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, K2.5 dramatically scales multi-step reasoning depth while maintaining stable tool-use across 200–300 sequential calls — a breakthrough in long-horizon agency with native INT4 quantization for 2x inference speed.

Humanity's Last Exam (w/ tools)

50.2%

Expert-level multimodal reasoning across 100+ subjects

Tokens (Mixed Visual & Text)

15T

Native multimodal pretraining at scale

Inference Speed-Up

2x

Native INT4 quantization with QAT

Model key capabilities
  • Native Multimodality: Pre-trained on vision-language tokens, excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
  • Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
  • Agent Swarm: Transitions from single-agent scaling to self-directed, coordinated swarm-like execution—decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
  • Production-Ready Efficiency: Native INT4 quantization achieving lossless 2x speed improvements with 256K context window
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

87.6%

24.4%

73.8%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    moonshotai/Kimi-K2.5

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "moonshotai/Kimi-K2.5",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="moonshotai/Kimi-K2.5",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'moonshotai/Kimi-K2.5',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Mixture-of-Experts (MoE) architecture with 1T total parameters and 32B activated parameters
    • 61 total layers including 1 dense layer with 384 experts selecting 8 per token
    • Multi-head Latent Attention (MLA) mechanism with 7168 attention hidden dimension
    • Native vision encoder: MoonViT with 400M parameters for vision-language integration
    • Native INT4 quantization applied to MoE components through Quantization-Aware Training (QAT)
    • 256K context window enabling complex long-horizon multimodal agentic tasks
    • 160K vocabulary size with SwiGLU activation function
    • Unified architecture combining vision and text, instant and thinking modes, conversational and agentic paradigms

    Training Methodology:
    • Continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base
    • Native multimodal training—pre-trained on vision-language tokens for seamless cross-modal reasoning
    • End-to-end trained to interleave chain-of-thought reasoning with function calls and visual grounding
    • Quantization-Aware Training (QAT) employed for lossless INT4 inference with 2x speed
    • Agent Swarm training—transitions from single-agent scaling to self-directed, coordinated swarm-like execution
    • Specialized training for parallel task decomposition and domain-specific agent instantiation

    Key Capabilities:
    • Native Multimodality: Excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs
    • Coding with Vision: Generates code from visual specifications (UI designs, video workflows) and autonomously chains tools for visual data processing
    • Agent Swarm: Decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents
    • Vision benchmarks: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista, 77.5% CharXiv reasoning

    Performance Characteristics:
    • State-of-the-art 50.2% on Humanity's Last Exam (HLE) with tools across 100+ expert subjects
    • Advanced mathematical reasoning: 96.1% AIME 2025, 95.4% HMMT 2025, 81.8% IMO-AnswerBench, 87.4% GPQA-Diamond
    • Strong coding capabilities: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual, 85.0% LiveCodeBench v6
    • Agentic search with swarm: 78.4% BrowseComp (swarm mode), 57.5% Seal-0
    • Long-context excellence: 79.3% on AA-LCR (avg@3), 69.4% LongBench-v2 (128K context)
    • 2x generation speed improvement through native INT4 quantization without performance degradation

  • Applications & use cases

    Multimodal Agentic Reasoning:
    • Expert-level reasoning across 100+ subjects achieving 50.2% on Humanity's Last Exam with tools
    • Vision-grounded reasoning: 78.5% MMMU-Pro, 84.2% MathVision, 90.1% MathVista
    • Cross-modal problem solving combining visual understanding with mathematical and logical reasoning
    • PhD-level mathematical problem solving: 96.1% AIME 2025, 95.4% HMMT 2025
    • Dynamic hypothesis generation from visual and textual inputs with evidence verification

    Coding with Vision:
    • Generate code from visual specifications: UI designs, mockups, and video workflows
    • Autonomous tool chaining for visual data processing and analysis
    • Production-level coding: 76.8% SWE-Bench Verified, 73.0% SWE-Bench Multilingual
    • Frontend development from visual designs: fully functional HTML, React, and responsive web applications
    • Video-to-code generation: analyze video workflows and generate implementation code
    • Competitive programming: 85.0% LiveCodeBench v6, 53.6% OJ-Bench

    Agent Swarm Orchestration:
    • Self-directed task decomposition into parallel sub-tasks
    • Dynamically instantiate domain-specific agents for coordinated execution
    • Swarm mode performance: 62.3% BrowseComp, 19.4% WideSearch
    • Complex research workflows with parallel information gathering and synthesis
    • Multi-agent coding projects with specialized sub-agents for different components

    Visual Understanding & Analysis:
    • Native image and video understanding with 400M parameter MoonViT encoder
    • Chart and graph reasoning: 77.5% CharXiv reasoning questions
    • Document understanding and visual question answering
    • Scientific visualization analysis and interpretation
    • UI/UX design understanding for code generation

    Agentic Search & Web Reasoning:
    • Goal-directed web-based reasoning with visual content understanding
    • Continuous browsing, searching, and reasoning over multimodal web information
    • 62.3% BrowseComp in swarm mode with coordinated sub-agent exploration
    • Visual content extraction and analysis from web sources

    Long-Horizon Multimodal Workflows:
    • Research automation across text and visual sources
    • Video analysis workflows with tool-augmented reasoning
    • Complex design-to-implementation pipelines
    • Multi-step visual data processing and code generation
    • 79.3% AA-LCR (avg@3), 69.4% LongBench-v2 with 128K context

    Creative & Multimodal Content Generation:
    • Image-grounded creative writing and storytelling
    • Visual analysis and cultural commentary
    • Technical documentation from visual specifications
    • Educational content combining visual and textual explanations

Related models
  • Model provider
    Moonshot AI
  • Type
    Chat
    Code
    LLM
  • Main use cases
    Vision
  • Speed
    Medium
  • Intelligence
    Very High
  • Deployment
    Serverless
    On-Demand Dedicated
    Monthly Reserved
  • Parameters
    1T
  • Activated parameters
    32B
  • Context length
    262K
  • Input price

    $0.50 / 1M tokens

  • Output price

    $2.80 / 1M tokens

  • Input modalities
    Text
    Image
  • Output modalities
    Text
  • Released
    December 31, 2025
  • Last updated
    January 26, 2026
  • Quantization level
    INT4
  • External link
  • Category
    Chat