Company

Together AI launches Llama 3.2 APIs for vision, lightweight models & Llama Stack: powering rapid development of multimodal agentic apps

September 25, 2024

・

Together AI

Today marks a major milestone in open source AI with the launch of Llama 3.2 vision and lightweight models, and the release of Llama Stack. We are thrilled to partner with Meta to integrate these models, and to be one of Llama Stack’s first API Providers.

Here’s the TLDR on what we’re launching today:

Free Llama 3.2 11B Vision Model - Developers can now use Llama 3.2's vision model for free through our Llama-Vision-Free multimodal model. It’s a powerful way to experiment with multimodal AI without any upfront cost 🎉
Vision Models (11B, 90B): Together Turbo endpoints for Llama 3.2 vision models provide exceptional speed and accuracy, optimized for tasks like image captioning, visual question answering, and image-text retrieval. Perfect for high-demand production applications with scalable, enterprise-ready performance.
Lightweight Model (3B): Designed for faster inference with reduced resource consumption, the Together Turbo endpoint for the 3B models ideal for applications requiring high performance at lower cost, maintaining efficiency without sacrificing speed or accuracy.
New Llama Stack APIs on Together AI - Together AI is one of the first API providers for Llama Stack, which standardizes the components required for building agentic, retrieval-augmented generation (RAG), and conversational applications. We encourage you to explore the Llama Stack repo on GitHub and integrate Meta’s example apps using Together AI’s APIs to accelerate your AI development.
Napkins.dev demo app - We’re excited to showcase Napkins.dev, an open source demo app that uses Llama 3.2 vision models to generate code from wireframes, sketches, or screenshots. This tool demonstrates how quickly and easily Llama 3.2 can be used to bring app ideas from concept to code.

{{custom-cta-1}}
Both long-established technology companies and startups use Llama on Together AI:

“At Mozilla, we appreciate how quickly we were able to get up and running with Together AI and Llama. Together AI’s inference engine’s performance is much faster than other Llama providers’, the OpenAI-compatible integration is straightforward and well-documented, and the cost is very reasonable. As Mozilla has been committed to open source since its inception, it's particularly exciting for us to build with companies and models that share our commitment to open innovation and research.” - Javaun Moradi, Sr. Manager, Innovation Studio at Mozilla

“Millions of software engineers use Blackbox's coding agents to transform the way they build and ship products today. We've been working with Together AI for the past 6 months and using Llama for synthetic data generation. Together AI's product and infrastructure is world class and the support from the team is exceptional!“ - Robert Rizk, Co-Founder and COO, Blackbox

Together Enterprise Platform offers infrastructure control, data privacy, and model ownership, empowering businesses with the most stringent requirements to deploy Llama models in Together Cloud, VPC or on-prem, with confidence.

🦙 Explore the Full Range of Llama 3.2 Models

Llama 3.2 offers a versatile range of models designed for both multimodal image and text processing and lightweight applications. Llama 3.2 models support 128K context length, 1120x1120 images, and multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Whether you’re looking to experiment with the free 11B model or need enterprise-grade performance with the 90B model, our endpoints provide the tools to build AI applications that meet your needs.

🆓 Llama-Vision-Free (image + text)
Free through the end of the year, this high-quality 11B model endpoint is ideal for development, experimentation, and personal non-commercial applications. It provides a free and easy way for developers to explore multimodal AI capabilities. Try Llama-Vision-Free now →

👁️ Llama-3.2-11B-Vision-Instruct-Turbo (image + text)
Optimized for multimodal use, the 11B model endpoint strikes a balance between performance and cost, making it a great fit for production applications such as image captioning and visual search. Try Llama-3.2-11B-Vision-Instruct-Turbo now →

🔍 Llama-3.2-90B-Vision-Instruct-Turbo (image + text)
The most accurate and reliable option, the 90B model endpoint is designed for high-stakes enterprise use cases, delivering superior performance in precision-demanding tasks like healthcare imaging, legal document analysis, and financial reporting. Try Llama-3.2-90B-Vision-Instruct-Turbo now →

⚡ Llama-3.2-3B-Instruct-Turbo (text only)
A versatile model endpoint ideal for agentic applications, offering the speed and efficiency needed for real-time AI agents while being lightweight enough for certain edge or mobile environments when required. Try Llama-3.2-3B-Instruct-Turbo now →

Try any of these models on our playground now, or contact our team to discuss your enterprise deployment needs.

👓 Advancing Multimodal AI in the Enterprise, Atop an Open Source Foundation

The Llama 3.2 vision models (11B and 90B parameters) offer powerful multimodal capabilities for image and text processing. When paired with the Together Platform – including the new enterprise capabilities we announced last week – the combination has the potential to unlock powerful real-world use cases like:

Multimodal Use Cases

Interactive Agents: Build AI agents that respond to both text and image inputs, providing a richer user experience.
Image Captioning: Generate high-quality image descriptions for e-commerce, content creation, and digital accessibility.
Visual Search: Allow users to search via images, enhancing search efficiency in e-commerce and retail.
Document Intelligence: Analyze documents with both text and visuals, such as legal contracts and financial reports.

Industry-Specific Applications

Together AI’s Llama 3.2 endpoints unlock new opportunities across industries:

Healthcare: Accelerate medical image analysis, improving diagnostic accuracy and patient care.
Retail & E-Commerce: Revolutionize shopping experiences with image and text-based searches and personalized recommendations.
Finance & Legal: Speed up workflows by analyzing graphical and textual content, optimizing contract reviews and audits.
Education & Training: Create interactive educational tools that process both text and visuals, enhancing engagement.

Example Multimodal Prompts

🤖 Building agentic systems with Llama Stack and Together AI

The Llama Stack defines and standardizes the building blocks needed to bring generative AI applications to market.

We’re excited to announce that Together AI is one of the first Llama Stack API providers.

Using Llama Stack with Together AI as your API provider enables the rapid creation of agentic systems and conversational apps that utilize retrieval-augmented generation (RAG). Together AI’s Llama Stack distribution is coming soon with these endpoints:

Llama Stack Inference API (Llama 3.1 + 3.2 with Together AI)
Llama Stack Safety APIs (LlamaGuard 3.1 + 3.2 with Together AI)
Llama Stack Memory APIs (integration with a vector database)
Llama Stack Agent API (using the three APIs above)

Meta has examples in the Llama Stack Apps repo. We can’t wait to see what you build!

👨‍💻 A New Example App: Generate Code from Sketches, Wireframes, and Screenshots with napkins.dev

napkins.dev is an open source example app that uses the Llama 3.2 vision models to generate code from images 🤯

napkins.dev takes a sketch, wireframe, or a screenshot as input, and then generates React code for it using Llama 3.2 vision 90B.

Try it for free at https://www.napkins.dev, or check out the GitHub repo to see how it works, or to run your own version.

🏁 Get Started with Llama 3.2 and Llama Stack on Together AI

At Together AI, we believe open source LLMs are the practical choice when deciding what foundation model to build upon — especially for enterprises and startups, where infrastructure control, model ownership, and cost savings are critical. With Llama on Together AI, businesses gain the ability to own and customize their models, while maintaining high performance without concerns about data privacy or the high costs that are inherent when using closed platforms. Enterprises will soon be able to fine-tune Llama 3.2 vision models on Together AI, further customizing them for specific tasks, while also benefiting from model ownership and portability that open source models such as Llama provide.

We invite you to explore Llama 3.2 on our playground:

Or use our Python SDK to quickly integrate Llama models into your applications:

import os
from together import Together
client = Together()
response = client.chat.completions.create(
  model="meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What sort of animal is in this picture? What is its usual diet? What area is the animal native to? And isn’t there some AI model that’s related to the image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/LLama.jpg/444px-LLama.jpg?20050123205659",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)
print(response.choices[0].message.content)

With Llama 3.2 and Together AI as an API Provider for Llama Stack, it’s never been easier to build, fine-tune, and scale multimodal AI applications tailored to your specific needs. Contact us to discuss your enterprise AI needs.

Lower
Cost
20%
faster
training
4x
network
compression
117x

Build with Llama 3.2 multimodal model for free

Use the Llama-Vision-Free model for development and experimentation

Try it on Together Playground

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Try Llama 3.2 today

Multimodal capabilities for image and text processing, free with Llama-Vision-Free model.