Input
from together import Together
client = Together() # API key via api_key param or TOGETHER_API_KEY env var
# Query image with Llama 4 Maverick model
response = client.chat.completions.create(
model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What can you see in this image?"},
{"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
]
}]
)
print(response.choices[0].message.content)
Output
The image depicts a serene landscape of Yosemite National Park, featuring a river flowing through a valley surrounded by towering cliffs and lush greenery.
* **River:**
* The river is calm and peaceful, with clear water that reflects the surrounding scenery.
* It flows gently from the bottom-left corner to the center-right of the image.
* The riverbank is lined with rocks and grasses, adding to the natural beauty of the scene.
* **Cliffs:**
* The cliffs are massive and imposing, rising steeply from the valley floor.
* They are composed of light-colored rock, possibly granite, and feature vertical striations.
* The cliffs are covered in trees and shrubs, which adds to their rugged charm.
* **Trees and Vegetation:**
* The valley is densely forested, with tall trees growing along the riverbanks and on the cliffsides.
* The trees are a mix of evergreen and deciduous species, with some displaying vibrant green foliage.
* Grasses and shrubs grow in the foreground, adding texture and color to the scene.
* **Sky:**
* The sky is a brilliant blue, with only a few white clouds scattered across it.
* The sun appears to be shining from the right side of the image, casting a warm glow over the scene.
In summary, the image presents a breathtaking view of Yosemite National Park, showcasing the natural beauty of the valley and its surroundings. The calm river, towering cliffs, and lush vegetation all contribute to a sense of serenity and wonder.
Function Calling
Input
import os
import json
import openai
client = openai.OpenAI(
base_url = "https://api.together.xyz/v1",
api_key = os.environ['TOGETHER_API_KEY'],
)
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
]
}
}
}
}
}
]
messages = [
{"role": "system", "content": "You are a helpful assistant that can access external functions. The responses from these function calls will be appended to this dialogue. Please provide responses based on the information from these function calls."},
{"role": "user", "content": "What is the current temperature of New York, San Francisco and Chicago?"}
]
response = client.chat.completions.create(
model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
messages=messages,
tools=tools,
tool_choice="auto",
)
print(json.dumps(response.choices[0].message.model_dump()['tool_calls'], indent=2))
Output
[
{
"id": "call_1p75qwks0etzfy1g6noxvsgs",
"function": {
"arguments": "{\"location\":\"New York, NY\",\"unit\":\"fahrenheit\"}",
"name": "get_current_weather"
},
"type": "function"
},
{
"id": "call_aqjfgn65d0c280fjd3pbzpc6",
"function": {
"arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"fahrenheit\"}",
"name": "get_current_weather"
},
"type": "function"
},
{
"id": "call_rsg8muko8hymb4brkycu3dm5",
"function": {
"arguments": "{\"location\":\"Chicago, IL\",\"unit\":\"fahrenheit\"}",
"name": "get_current_weather"
},
"type": "function"
}
]
Query models with multiple images
Currently this model supports 5 images as input.
Input
# Multi-modal message with multiple images
response = client.chat.completions.create(
model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare these two images."
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/slack.png"
}
}
]
}]
)
print(response.choices[0].message.content)
Output
The first image is a collage of multiple identical landscape photos showing a natural scene with rocks, trees, and a stream under a blue sky. The second image is a screenshot of a mobile app interface, specifically the navigation menu of the Canva app, which includes icons for Home, DMs (Direct Messages), Activity, Later, Canvases, and More.
### Comparison:
1. **Content**:
- The first image focuses on a natural landscape.
- The second image shows a digital interface from an app.
2. **Purpose**:
- The first image could be used for showcasing nature, design elements in graphic work, or as a background.
- The second image represents the functionality and layout of the Canva app's navigation system.
3. **Visual Style**:
- The first image has vibrant colors and realistic textures typical of outdoor photography.
- The second image uses flat design icons with a simple color palette suited for user interface design.
4. **Context**:
- The first image is likely intended for artistic or environmental contexts.
- The second image is relevant to digital design and app usability discussions.