Vision

NIM Llama 3.2 90B Vision Instruct

NVIDIA NIM for GPU accelerated Llama 3.2 90B Vision Instruct inference through OpenAI compatible APIs.

About model

NIM Llama 3.2 90B Vision Instruct processes visual and text-based inputs to generate human-like responses. It is the largest Llama vision model, ideal for applications requiring advanced vision understanding. Suitable for developers and researchers needing high-performance vision capabilities.

To run this model, you first need to deploy it on a Dedicated Endpoint.

Quickstart guides

Vision

Quickstart: How to Do OCR

RAG

Building a RAG Workflow

Related models

Model provider
Meta
Type
Vision
Main use cases
Vision
Deployment
On-Demand Dedicated
Monthly Reserved
Parameters
90B
Context length
128K
Input modalities
Text
Image
Output modalities
Text

Released
September 24, 2024
Last updated
August 26, 2025
External link
Provider docs
Category
Vision

Quickstart docs

Deploy model