Models / Liquid AI
Chat

LFM2 24B A2B

Efficient hybrid model optimized for high-volume multi-agent workflows.

About model

LFM2-24B-A2B is a hybrid MoE model with 24B total parameters (2.3B activated per token) optimized as the fast inner-loop model for high-volume multi-agent pipelines. The model features a unique hybrid architecture with 30 double-gated LIV convolution blocks + 10 GQA blocks, delivering cost-effective inference enabling massive agent concurrency on the same infrastructure. With native function calling, web search, and structured outputs, LFM2-24B-A2B serves as the generation backbone in high-throughput RAG pipelines while supporting 9 languages across 32,768 token context on Together AI's production infrastructure.

More Agents Per GPU

10x

Massive concurrency at lower cost

Context Length

32K

Extended multi-turn conversations

Languages

9

Global deployment support

Model key capabilities
  • Run 10x More Agents: Minimal active parameters enable massive concurrency at lower cost
  • Built for Agent Loops: Tool calling, web search, structured outputs for multi-step workflows
  • Global Language Support: 9 languages including English, Chinese, Arabic, Japanese, Korean, Spanish
  • Production-Ready: 99.9% SLA, 32K context, available on serverless and dedicated infrastructure
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    LiquidAI/LFM2-24B-A2B

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "LiquidAI/LFM2-24B-A2B",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="LiquidAI/LFM2-24B-A2B",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'LiquidAI/LFM2-24B-A2B',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Hybrid MoE model with 24B total parameters, 2.3B activated per token
    • 40-layer architecture: 30 double-gated LIV convolution blocks + 10 GQA blocks
    • 64 experts per MoE block with top-4 routing, first 2 layers dense
    • Hidden dimension: 2,048 with expert intermediate size: 1,536
    • 32,768 token context length for extended workflows
    • 65,536 vocabulary size for efficient tokenization
    • Minimal active parameters enabling massive agent concurrency
    • Designed as fast inner-loop model in multi-step agent pipelines

    Training Methodology:
    • Trained on 17T tokens (pre-training ongoing)
    • General-purpose instruct model without reasoning traces
    • Optimized for fast inference in high-volume multi-agent systems
    • 9-language support: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese

    Performance Characteristics:
    • Cost-effective efficiency: 24B MoE with only 2.3B active parameters per token
    • Native function calling for tool orchestration in agent workflows
    • Web search integration for retrieval-augmented generation
    • Structured outputs for reliable data extraction and formatting
    • Fast inner-loop performance optimized for multi-step pipelines
    • High-throughput inference enabling massive concurrent workloads

  • Applications & use cases

    High-Volume Multi-Agent Pipelines:
    • Optimized as fast inner-loop model for multi-step agent workflows at scale
    • Native function calling for tool orchestration and API integration
    • Structured outputs for reliable data extraction between agent steps
    • Minimal active parameters (2.3B) enabling massive concurrent agent execution
    • 32K context supporting extended multi-turn agent conversations
    • Cost-effective inference for high-throughput production deployments

    High-Throughput RAG Pipelines:
    • Generation backbone optimized for production-scale retrieval-augmented setups
    • Web search integration for real-time information retrieval
    • Structured outputs for consistent formatting of retrieved data
    • Efficient tokenization with 65,536 vocabulary size
    • Fast inference enabling low-latency, high-volume RAG responses
    • Cost-effective scaling for enterprise RAG deployments

    Production Agentic Tool Use:
    • Native function calling for seamless tool integration at scale
    • Web search capabilities for autonomous information gathering
    • Structured outputs ensuring reliable tool response parsing
    • Fast inner-loop performance for high-throughput agent operations
    • Multi-language support (9 languages) for global deployment
    • Minimal active parameters reducing inference costs

    Cost-Effective Inference at Scale:
    • 24B parameters with only 2.3B active—cheaper inference per token
    • Run more concurrent agents on same infrastructure
    • Hybrid architecture optimized for production efficiency
    • Minimal memory footprint via sparse MoE activation
    • High-volume deployment without proportional cost increases

    Multilingual Production Applications:
    • 9-language support: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese
    • Cross-lingual agent workflows and tool calling
    • Multilingual RAG pipelines with consistent performance
    • Global deployment with regional language support
    • Cost-effective scaling across international markets

Related models
  • Model provider
    Liquid AI
  • Type
    Chat
  • Main use cases
    Chat
    Small & Fast
  • Deployment
    Serverless
    Monthly Reserved
  • Parameters
    23.8B
  • Context length
    32K
  • Input price

    $0.03 / 1M tokens

  • Output price

    $0.12 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text