Deploy Google's Gemma 3 models on Together AI. Lightweight, state-of-the-art open models built from the same technology that powers Gemini.
Why Google on Together AI?
Designed for production workloads that need consistent performance and operational control.
Gemini technology, open and deployable
Gemma models are built from the same research and architecture that powers Gemini. Deploy frontier AI you can fine-tune, own, and run without restrictions.
The best open model on a single GPU
Gemma 3 27B is the most capable open model that fits on a single NVIDIA H100 — with multimodal vision, unparalleled multilingual support, and a 128K context window.
From cloud to edge, one model family
Sizes from 270M to 27B run everywhere — cloud infrastructure, workstations, and mobile. SOC 2 Type II certified and HIPAA compliant on Together AI's US-based infrastructure.
Meet the Google family
Explore top-performing models across text, image, video, code, and voice.
Deployment options
Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.
Real-time
A fully managed inference API that automatically scales with request volume.
Best for
Batch
Process massive workloads of up to 30 billion tokens asynchronously, at up to 50% less cost.
Best for
Dedicated Model Inference
An inference endpoint backed by reserved, isolated compute resources and the Together AI inference engine.
Best for
Dedicated Container Inference
Run inference with your own engine and model on fully-managed, scalable infrastructure.
Best for