This website uses cookies to anonymously analyze website traffic using Google Analytics.

36K GPUs NVIDIA GB200 GPUs, coming in Q1 2025. Request your

Trusted by

Together Inference

Best combination of performance, accuracy & cost at production scale so you don't have to compromise.

SPEED RELATIVE TO VLLM

4x FASTER

LLAMA-3 8B AT FULL PRECISION

400 TOKENS/SEC

COST RELATIVE TO GPT-4o

11x lower cost

Why Together Inference

Cutting edge research, models that fit your needs, and flexible deployment options.

200+
Open
MODELS

Get $5 free credit on any model - or use Llama-Vision-Free and FLUX.1-schnell-Free models free of charge.

Deploy as Serverless & Dedicated Endpoints, or in your own VPC using Together Enterprise.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Try now
together.ai

Together Fine-tuning

Fine-tune leading open-source models with your data to achieve greater accuracy for your tasks.

together files upload acme_corp_customer_support.jsonl
  
{
  "filename" : "acme_corp_customer_support.json",
  "id": "file-aab9997e-bca8-4b7e-a720-e820e682a10a",
  "object": "file"
}
  
  
together finetune create --training-file file-aab9997-bca8-4b7e-a720-e820e682a10a
--model together compute/RedPajama-INCITE-7B-Chat

together finetune create --training-file $FILE_ID 
--model $MODEL_NAME 
--wandb-api-key $WANDB_API_KEY 
--n-epochs 10 
--n-checkpoints 5 
--batch-size 8 
--learning-rate 0.0003
{
    "training_file": "file-aab9997-bca8-4b7e-a720-e820e682a10a",
    "model_output_name": "username/togethercomputer/llama-2-13b-chat",
    "model_output_path": "s3://together/finetune/63e2b89da6382c4d75d5ef22/username/togethercomputer/llama-2-13b-chat",
    "Suffix": "Llama-2-13b 1",
    "model": "togethercomputer/llama-2-13b-chat",
    "n_epochs": 4,
    "batch_size": 128,
    "learning_rate": 1e-06,
    "checkpoint_steps": 2,
    "created_at": 1687982945,
    "updated_at": 1687982945,
    "status": "pending",
    "id": "ft-5bf8990b-841d-4d63-a8a3-5248d73e045f",
    "epochs_completed": 3,
    "events": [
        {
            "object": "fine-tune-event",
            "created_at": 1687982945,
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
        }
    ],
    "queue_depth": 0,
    "wandb_project_name": "Llama-2-13b Fine-tuned 1"
}

Together GPU Clusters

Get your own private GPU cluster – with hundreds or thousands of interconnected NVIDIA GPUs – for large training and fine-tuning today.

Use our purpose built training clusters with H100, H200, and A100 GPUs connected over fast Infiniband networks. Your cluster comes optimized for distributed training with the accelerated Together Kernel Collection the box. You focus on your model, and we’ll ensure everything runs smoothly.

  • 01

    We offer flexible terms – even with our highest quality hardware. You can commit to just a month or reserve capacity for up to 5 years.

  • 02

    H100 Clusters Node Specifications: 

    - 8x Nvidia H100 / 80GB / SXM5
    - 3.2 Tbps Infiniband network
    - 2x AMD EPYC 9474F 18 Cores 96 Threads 3.6GHz CPUs
    - 1.5TB ECC DDR5 Memory

    - 8x 3.84TB NVMe SSDs

    A100 SXM Clusters Node Specifications: 
    - 8x NVIDIA A100 80GB SXM
    - 4120 vCPU Intel Xeon (Sapphire Rapids)
    - 960 GB RAM
    - 8 x 960GB NVMe storage
    - 200 Gbps Ethernet or 3200 Gbps Infiniband configs available

  • 03

    We value your time. Clusters are pre-configured for high-speed distributed training, using Slurm and the Together Custom Models stack to get you up and running at lightspeed.

Training-ready clusters – H100, H200, or A100

Reserve your cluster today

THE FASTEST CLOUD FOR GEN AI.

BUILT ON LEADING AI RESEARCH.

Sphere

Innovations

Our research team is behind breakthrough AI models, datasets, and optimizations.

Build, deploy, and scale. All in a single platform.

  • 01

    Build

    Whether prompt engineering, fine-tuning, or training, we are ready to meet your business demands.

  • 02

    Deploy

    Easily integrate your new model into your production application using the Together Inference API.

  • 03

    Scale

    With the fastest performance available and elastic scaling, Together AI is built to scale with your needs as you grow.

Customer Stories

See how we support leading teams around the world. Our customers are creating innovative generative AI applications, faster.

Pika creates the next gen text-to-video models on Together GPU Clusters

Nexusflow uses Together GPU Clusters to build cybersecurity models

Arcee builds domain adaptive language models with Together Custom Models

Start
building
yours
here →