LFM2 24B A2B

Efficient hybrid model optimized for high-volume multi-agent workflows.

About model

LFM2-24B-A2B is a hybrid MoE model with 24B total parameters (2.3B activated per token) optimized as the fast inner-loop model for high-volume multi-agent pipelines. The model features a unique hybrid architecture with 30 double-gated LIV convolution blocks + 10 GQA blocks, delivering cost-effective inference enabling massive agent concurrency on the same infrastructure. With native function calling, web search, and structured outputs, LFM2-24B-A2B serves as the generation backbone in high-throughput RAG pipelines while supporting 9 languages across 32,768 token context on Together AI's production infrastructure.

More Agents Per GPU

10x

Massive concurrency at lower cost

Context Length

32K

Extended multi-turn conversations

Languages

Global deployment support

Model key capabilities

Run 10x More Agents: Minimal active parameters enable massive concurrency at lower cost
Built for Agent Loops: Tool calling, web search, structured outputs for multi-step workflows
Global Language Support: 9 languages including English, Chinese, Arabic, Japanese, Korean, Spanish
Production-Ready: 99.9% SLA, 32K context, available on serverless and dedicated infrastructure

Quickstart guides

RAG

Building a RAG Workflow

Agents

Agent Workflows

Apps

Next.js Chat Quickstart

API usage

cURL
Python
Typescript

Endpoint:

LiquidAI/LFM2-24B-A2B

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LiquidAI/LFM2-24B-A2B",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="LiquidAI/LFM2-24B-A2B",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'LiquidAI/LFM2-24B-A2B',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• Hybrid MoE model with 24B total parameters, 2.3B activated per token
• 40-layer architecture: 30 double-gated LIV convolution blocks + 10 GQA blocks
• 64 experts per MoE block with top-4 routing, first 2 layers dense
• Hidden dimension: 2,048 with expert intermediate size: 1,536
• 32,768 token context length for extended workflows
• 65,536 vocabulary size for efficient tokenization
• Minimal active parameters enabling massive agent concurrency
• Designed as fast inner-loop model in multi-step agent pipelines

Training Methodology:
• Trained on 17T tokens (pre-training ongoing)
• General-purpose instruct model without reasoning traces
• Optimized for fast inference in high-volume multi-agent systems
• 9-language support: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese

Performance Characteristics:
• Cost-effective efficiency: 24B MoE with only 2.3B active parameters per token
• Native function calling for tool orchestration in agent workflows
• Web search integration for retrieval-augmented generation
• Structured outputs for reliable data extraction and formatting
• Fast inner-loop performance optimized for multi-step pipelines
• High-throughput inference enabling massive concurrent workloads
‍
Applications & use cases
High-Volume Multi-Agent Pipelines:
• Optimized as fast inner-loop model for multi-step agent workflows at scale
• Native function calling for tool orchestration and API integration
• Structured outputs for reliable data extraction between agent steps
• Minimal active parameters (2.3B) enabling massive concurrent agent execution
• 32K context supporting extended multi-turn agent conversations
• Cost-effective inference for high-throughput production deployments
High-Throughput RAG Pipelines:
• Generation backbone optimized for production-scale retrieval-augmented setups
• Web search integration for real-time information retrieval
• Structured outputs for consistent formatting of retrieved data
• Efficient tokenization with 65,536 vocabulary size
• Fast inference enabling low-latency, high-volume RAG responses
• Cost-effective scaling for enterprise RAG deployments
Production Agentic Tool Use:
• Native function calling for seamless tool integration at scale
• Web search capabilities for autonomous information gathering
• Structured outputs ensuring reliable tool response parsing
• Fast inner-loop performance for high-throughput agent operations
• Multi-language support (9 languages) for global deployment
• Minimal active parameters reducing inference costs
Cost-Effective Inference at Scale:
• 24B parameters with only 2.3B active—cheaper inference per token
• Run more concurrent agents on same infrastructure
• Hybrid architecture optimized for production efficiency
• Minimal memory footprint via sparse MoE activation
• High-volume deployment without proportional cost increases
Multilingual Production Applications:
• 9-language support: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese
• Cross-lingual agent workflows and tool calling
• Multilingual RAG pipelines with consistent performance
• Global deployment with regional language support
• Cost-effective scaling across international markets

Related models

Model specifications

Model data

Model provider
Liquid AI
Type
Chat
Main use cases
Chat
Small & Fast
Deployment
Serverless
Monthly Reserved
Endpoint
LiquidAI/LFM2-24B-A2B
Parameters
23.8B
Context length
32K
Input price
$0.03 / 1M tokens
Output price
$0.12 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
February 24, 2026
External link
Provider docs
Category
Chat

Run in Playground

Quickstart docs

Deploy model

LFM2 24B A2B

About model

API usage

Model card

Applications & use cases