Company

Announcing v1 of our Python SDK

April 22, 2024

・

Together AI

The Together AI Python SDK is officially out of beta with the v1 release! It provides great OpenAI compatible APIs to:

Run inference on chat, language, code, moderation, and image models
Fine-tune models (including Llama 3) with your own data
Generate embeddings from text for RAG applications

v1 comes with several improvements including a new more intuitive fully OpenAI compatible API, async support, messages support, more thorough tests, and better error handling. Upgrade to v1 by running pip install --upgrade together.

Chat Completions

To use any of the 60+ chat models we support, you can run the following code:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.chat.completions.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)

Streaming

To stream back a response, simply specify stream=True.

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
stream = client.chat.completions.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": "tell me about new york"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Completions

To run completions on our code and language models, do the following:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.completions.create(
    model="codellama/CodeLlama-70b-Python-hf",
    prompt="def bubble_sort(): ",
)
print(response.choices[0].text)

Image Models

To use our image models, run the following:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.images.generate(
    prompt="space robots",
    model="stabilityai/stable-diffusion-xl-base-1.0",
    steps=10,
    n=2,
)
print(response.data[0].b64_json)

Embeddings

To generate embeddings with any of our embedding models, do the following:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

text = "Our solar system orbits the Milky Way galaxy at about 515,000 mph"
embeddingModel = 'togethercomputer/m2-bert-80M-8k-retrieval'
embeddings = client.embeddings.create(model=embeddingModel, input=text)

print(embeddings)

Async Support

We now have async support! Here’s what that looks like for chat completions:

import os, asyncio
from together import AsyncTogether

async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
messages = [
    "What are the top things to do in San Francisco?",
    "What country is Paris in?",
]

async def async_chat_completion(messages):
    async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-3-70b-chat-hf",
            messages=[{"role": "user", "content": message}],
        )
        for message in messages
    ]
    responses = await asyncio.gather(*tasks)

    for response in responses:
        print(response.choices[0].message.content)

asyncio.run(async_chat_completion(messages))

See this example to see async support for completions.

Fine-tuning

We also provide the ability to fine-tune models through our SDK or CLI, including the newly released Llama 3 models. Simply upload a file in JSONL format and create a fine-tuning job as seen in the code below:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

# Uploads a jsonl file and returns the ID
uploaded_file = client.files.upload(file="somedata.jsonl")

# Creates a fine-tuning job
client.fine_tuning.create(
    training_file=uploaded_file.id,
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    n_epochs=3,
    # wandb_api_key="1a2b3c4d5e.......",
)

For more about fine-tuning, including data formats, check out our finetuning docs.

Learn more in our documentation and python library on GitHub. We’re also actively working on a similar TypeScript SDK that will be out in the coming weeks as well!

Lower
Cost
20%
faster
training
4x
network
compression
117x

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Links in this
article

‍