Model Library

Published 12/11/2023

Can you feel the MoE? Mixtral available with over 100 tokens per second through Together Platform!

Today, Mistral released Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights.

Mixtral-8x7b-32kseqlen, DiscoLM-mixtral-8x7b-v2 and are now live on our inference platform! We have optimized the Together Inference Engine for Mixtral and it is available at up to 100 token/s for $0.0006/1K tokens — to our knowledge the fastest performance at the lowest price!

Chat with it in our playground:

Try Now

Or use this code snippet: 

curl -X POST https://api.together.xyz/inference \
      -H 'Content-Type: application/json' \
      -H "Authorization: Bearer $TOGETHER_API_KEY"\
      -d '{
      "model": "DiscoResearch/DiscoLM-mixtral-8x7b-v2",
      "max_tokens": 512,
      "prompt": "<|im_start|>user\nTell me about San Francisco<|im_end|>\n<|im_start|>assistant",
      "temperature": 0.7,
      "top_p": 0.7,
      "top_k": 50,
      "repetition_penalty": 1,
      "stream_tokens": true,
      "stop": [
        "<|im_end|>",
        "<|im_start|>"
      ]
    }'

More on Mixtral

Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Mixtral...

  • Handles a context of 32k tokens.
  • Handles English, French, Italian, German and Spanish.
  • Shows strong performance in code generation.
  • Can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

Transitioning from OpenAI?

Here’s how simple it is to switch from Open AI to Together’s Mixtral serverless endpoint -


import openai
import os

client = openai.OpenAI(
    api_key=os.environ.get("TOGETHER_API_KEY"),
    base_url='https://api.together.xyz',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
           "role": "user",
           "content": "Tell me about San Francisco",
        }
    ],
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
)

print(chat_completion.choices[0].message.content)

Simply add your "TOGETHER_API_KEY" (which you can find here), change the base URL to: https://api.together.xyz, and the model name to one of our 100+ open source models, and you'll be off to the races!