Announcing Together Custom Models. Build a state-of-the-art LLM with Together AI — and own the model.
Training your own state-of-the-art LLM enables you to achieve the highest accuracy and adaptability to your tasks, with the best price-performance tradeoff for your production applications.
In many cases, your proprietary data is not well represented in leading foundation models and you can achieve significantly higher accuracy with your own custom model that is smaller, faster, and more efficient at scale. However, training a large AI model can be a daunting task. It requires significant computing power and deep experience with the multiple stages of building large foundation models.
Together Custom Models is an end-to-end solution for data science and AI teams to build powerful models from data design to evaluation. At Together AI, we have built technology, systems, and expert teams that specialize in building foundation models to meet your demanding business requirements. Together Custom Models will help you with each stage of the process:
- Data discovery & optimization
- Model selection & training recipe
- Training
- Tuning & alignment
- Evaluation
Together Custom Models runs on Together GPU Clusters — state-of-the-art clusters with NVIDIA H100 and A100 GPUs running on fast Infiniband networks. And, with Together Custom Models, we are committed to making each customer successful, so our team of expert researchers is available to work with you every step of the way.
Model Ownership
The models you build with Together Custom Models are yours. You retain full ownership of the model that is created, all checkpoints are delivered to you, and you can run your model wherever you please. Of course, we aim to make Together Inference the best place to host your model for the fastest performance and best cost efficiency.
Stages
We will walk you through each stage of Together Custom Models with more details in this blog. You can customize the solution with specific stages if needed. And, we are excited to share our first customer story of using Together Custom Models: Arcee.ai!
01. Data discovery & optimization
Bring your own full dataset or combine your data with powerful open-source datasets like RedPajama-v2. With Together Custom Models, your training dataset is tailored to your model requirements using state-of-the-art techniques like data quality signals, DSIR, and DoReMi.
RedPajama-v2 is a unique dataset with 30T tokens that comes with 40 quality signals in 5 categories such as natural language characteristics and toxicity. This means that you can use it to boost your model quality by incorporating these signals into your model, or selecting slices of RedPajama-v2 to meet your model needs. Additionally, Together Custom Models can leverage advanced data selection tools like DSIR to efficiently train your model. DSIR estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights.
Another crucial step for data is to determine the optimal mixture of your datasets to efficiently achieve high model quality. We leverage methods like DoReMi, an algorithm for finding the optimal weighting of datasets using Distributionally Robust Optimization. DoReMi showed a model trained with an optimized data mixture using DoReMi achieves baseline downstream accuracy 2.6x faster than the default domain weights from The Pile.
02. Selecting model architecture, hyperparameters, and training strategy
Building a successful machine learning model requires a lot of expertise and knowledge. This is where leveraging our know-how can be incredibly helpful!
Custom Tokenizer – Build your own tokenizer tailored to your data. You can also skip this step and use your pre-trained tokenizer or publicly available tokenizers.
Architecture selection – Whether you are looking for a Transformer based architecture like BERT or GPT, or something else, we will help you to select the right architecture for you and your needs. We are also the builder of Hyena and Monarch Mixer, new model architectures that are sub-quadratic in sequence length, enable longer context, and provide significant performance advantages.
Training recipe – Together Custom Models includes a number of proven training recipes from conversational chat (RedPajama-INCITE Chat), instruction-tuning (RedPajama-INCITE Instruct), long context optimization (LLaMA-2-7B-32K-Instruct), and more. The expert Together Research team is here for you to share our extensive experience in building successful models to help you select the right model architecture and training recipe. Moreover, we can help you find the optimal model size, quantization, and training duration using scaling laws that are customized to your needs and budgets.
03. Training
Distributed training is an essential part of training a large AI model at scale. However, managing and optimizing distributed training jobs can be challenging, especially working with large datasets and complex models. Together Custom Models schedules, orchestrates, and optimizes your training jobs over any number of GPUs, making it easy for you to manage and scale your distributed training jobs. Just provide training and model configs, or use the configs found in the previous steps. All you need to do is to simply monitor the training progress in W&B, and Together Custom Models takes care of everything else. Have your own code-level customizations? Not a problem. We can work with you to integrate them into Together Custom Models.
Our stack leverages state-of-the-art techniques like FlashAttention-2 and CocktailSGD to experience fast, reliable performance for your training job. Compared to a standard attention implementation in PyTorch, FlashAttention-2 can be up to 9x faster! By training with Together Custom Models, you can focus on building and training your models, while we take care of the rest.
04. Tuning & alignment
Ensuring that a large language model (LLM) is aligned with specific downstream tasks and goals is a crucial aspect of developing a safe, reliable, and high-quality model. By aligning an LLM with your objectives, you can enhance its overall quality and performance on specific tasks.
Together Custom Models provides a range of tools and techniques from instruction-tuning and reinforcement learning from human feedback (RLHF), to long context fine-tuning, so that you can further customize your model. Our extensive experience in this field such as RedPajama-INCITE Instruct and LLaMA-2-7B-32K-Instruct will guide you to a successful model development.
05. Evaluation
We don’t just stop after training your model. We evaluate your final model on public benchmarks such as HELM and LM Evaluation Harness, and your own custom benchmarks or validation sets. All open models in the HELM benchmark are run by Together!
Customer Story: Building a patent model with Together Custom Models – Arcee.ai
To give you a better sense of what it’s like building your model with Together Custom Models, we’d like to tell you a bit about our customer story from Arcee.
Arcee is a growing start up in the LLM space building domain adaptive language models for organizations. Using Together Custom Models, Arcee is building an LLM with a domain specific dataset.
"Our relationship with Together AI has yielded remarkable achievements, including state-of-the-art models. These models are specialized, grounded, and laser-focused on specific verticals and use cases. Working with Together AI helped us dramatically accelerate development."
— Mark McQuade, CEO of Arcee
- Training over 4B tokens
- 7B parameter model
Get started today
Contact us to build your own model with Together Custom Models!
- Lower
Cost20% - faster
training4x - network
compression117x
Q: Should I use the RedPajama-V2 Dataset out of the box?
RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.