Together Research

Foundational research for production AI

recognized by

Our research areas

  • Inference

    Design and optimization of production inference systems, spanning scheduling, batching, and hardware–software co-design for reliable high throughput.

    Read papers
  • Kernels

    Development of high-performance GPU kernels for training and inference, optimizing memory, attention, and custom operators at production scale.

    Read papers
  • Model Shaping

    Advancement of post-training methods like fine-tuning, distillation, and quantization to shape efficient, controllable model behavior.

    Read papers
  • Agents

    Studies of long-horizon reasoning and decision-making, focusing on tool use, multi-step planning, and reinforcement learning for reliable agentic systems.

    Read papers

Recognized research

Papers accepted at top conferences

Spotlight · ICLR

ThunderKittens: Simple, Fast, and Adorable AI Kernels

Benjamin F. Spector, Simran Arora, Aaryan Singhal, Daniel Y. Fu, Christopher Ré

Outstanding Paper · COLM

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Best Paper · ICML HAET

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

MLSys

CDLM: Consistency diffusion language models for faster sampling

Minseo Kim, Chenfeng Xu, Coleman Hooper, Harman Singh, Ben Athiwaratkun, Ce Zhang, Kurt Keutzer, Amir Gholami

MLSys

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

Zelei Shao, Vikranth Srivatsa, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue Wang, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang

MLSys

Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost

Haojun Xia, Xiaoxia Wu, Jisen Li, Robert Wu, Junxiong Wang, Jue Wang, Chenxi Li, Aman Singhal, Alay Dilipbhai Shah, Alpay Ariyak, Donglin Zhuang, Zhongzhu Zhou, Ben Athiwaratkun, Zhen Zheng, Shuaiwen Leon Song

MLSys

ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels

Simran Arora

ICLR

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

Max Ryabinin

ICLR

When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework

Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang

ICLR

Speculative Speculative Decoding

Tanishq Kumar, Tri Dao, Avner May

Spotlight · NeurIPS Datasets and Benchmarks

RedPajama: an Open Dataset for Training Large Language Models

Maurice Weber, Daniel Y. Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang

ICML, ME-FoMo

Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

Spotlight · ICML

Simple linear attention language models balance the recall-throughput tradeoff

Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré

Oral · NeurIPS

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Dan Fu, Simran Arora, Jessica Grogan, Isys Johnson, Evan Sabri Eyuboglu, Armin Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

Oral · ICML

Hyena Hierarchy: Towards Larger Convolutional Language Models

Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Re

Oral · ICLR

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Dan Fu, Tri Dao, Khaled Saab, Armin Thomas, Atri Rudra, Christopher Re

Oral · ICML

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

Oral · ICML

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher Re, Ion Stoica, Ce Zhang

Key open-source projects

  • FlashAttention

    IO-aware exact attention, universally adopted

  • Flash Decoding

    8× faster long-context token generation

  • Mixture of Agents

    Open models, working together, beat GPT-4o

  • Dragonfly

    Tiny 8B model beats Med-Gemini on every benchmark

  • Red Pajama Datasets

    100T+ tokens powering 500+ models

  • DeepCoder

    First open model to match o3-mini on code

  • Open Deep Research

    Open-source multi-model deep research agent

  • Open Data Scientist Agent

    Autonomous agent tops Adyen's real-world benchmark

In the spotlight

Featured talks and conference presentations by our researchers

conference

At Slush 2025, Together AI VP of Kernels Dan Fu dives into building, using, and managing AI agents.

00:00

/

00:00

Research team

Researchers and engineers pushing the boundaries of AI

Ce Zhang

Founder & CTO

Chris Ré

Founder

Tri Dao

Founder & Chief Scientist

Percy Liang

Founder

Ben Athiwaratkun

Core ML

Dan Fu

Kernels

James Zou

Frontier Agents

Leon Song

Core ML

Max Ryabinin

Model Shaping

Simran Arora

Kernels

Yineng Zhang

Inference