Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models

The RedPajama project aims to create a set of leading open-source models and to rigorously understand the ingredients that yield good performance. A few weeks ago we released the RedPajama base dataset based on the LLaMA paper, which has galvanized the open-source community. The 5 terabyte dataset has been downloaded hundreds of times and used to train models like MPT, OpenLLaMA, OpenAlpaca. Today we are excited to release RedPajama-INCITE models, including instruct-tuned and chat versions.
Today’s release includes our first models trained on the RedPajama base dataset: a 3 billion and a 7B parameter base model that aims to replicate the LLaMA recipe as closely as possible. In addition we are releasing fully open-source instruction-tuned and chat models. Our key takeaways:
- The 3B model is the strongest in its class, and the small size makes it extremely fast and accessible (it even runs on a RTX 2070 released over 5 years ago).
- The instruction-tuned versions of the models achieve strong performance on HELM benchmarks. As expected, on HELM the 7B model performance is higher than the base LLaMA model by 3 points. We recommend using these models for downstream applications with few-shot, entity extraction, classification, or summarization tasks.
- The 7B model (which is 80% done training) is already outperforming the Pythia 7B model, which is showing the importance of a bigger dataset and the value of the RedPajama base dataset.
- Based on our observations, we see a clear path for creating a better version of the RedPajama dataset, which we will release in the coming weeks, that will go beyond the quality of LLaMA 7B. We plan to build models at larger scale with this new dataset.
- We expect differences between the LLaMA 7B and our replication, which we have investigated below.
The biggest takeaway is the demonstration that performant LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s Pythia training code, FlashAttention from Stanford and Together, the HELM benchmarks from Stanford CRFM and generous support from EleutherAI & LAION for compute time on the Summit supercomputer within the INCITE program award "Scalable Foundation Models for Transferable Generalist AI”. We believe these kind of open collaborations, at larger scales, will be behind the best AI systems of the future.
“RedPajama 3B model is the strongest model in it’s class and brings a performant large language model to a wide variety of hardware.”
Today’s release includes the following models, all released under the permissive Apache 2.0 license allowing for use both in research and commercial applications.
In only a few weeks the support, suggestions, and feedback for RedPajama from the open-source community has been incredible. Based on our learnings, we are also already starting the next version of the RedPajama base dataset which will be nearly twice the size of the original v1 dataset. Thank you for your support, feedback and suggestions!

During RedPajama model training we have shared regular updates, and both the 3B and 7B models have now been trained on 800 billion tokens. We are excited to see that the 3B model has stabilized at 800 billion tokens and the 7B model continues to improve as it completes training to 1 trillion tokens.
3B RedPajama Models
RedPajama-INCITE-Base-3B-v1 is trained over the RedPajama v1 dataset, with the same architecture as the popular Pythia model suite. We chose to start with the Pythia architecture to understand the value of training with the much larger RedPajama dataset with respect to the current leading open-source dataset, the Pile. Training on Summit leveraged the DeeperSpeed codebase developed by EleutherAI.
We are excited to see that at 800B tokens, RedPajama-Base-INCITE-3B has better few-shot performance (measured in HELM, as the average score over 16 core scenarios) and better zero-shot performance (measured in Eleuther’s LM evaluation harness) compared with open models of similar size, including the well-regarded GPT-Neo and Pythia-2.8B (trained with 420B and 300B tokens, respectively, with the Pile). On HELM, it outperforms these models by 3-5 points. On a subset of tasks from lm-evaluation-harness, outperforms these open models by 2-7 points.
Additionally, we are excited to release an instruction-tuned version of this 3B model, RedPajama-INCITE-Instruct-3B-v1, trained following Together’s GPT-JT recipe and removing any data in HELM benchmarks to ensure that there is no contamination with respect to HELM. This model shows excellent performance on few-shot tasks, even approaching the quality of LLaMA 7B in a much smaller model, as shown in the results below:
Few Shot Results on HELM Core Scenarios
The base model also performs well on zero-shot tasks, as measured using EleutherAI’s language model evaluation harness:
(Zero Shot) Results on a subset of lm-evaluation-harness, following LLM Worksheet’s selection of tasks & metrics. We didn’t run coqa because of an error as in this issue.
Results on a subset of lm-evaluation-harness, tasks selected from what used to evaluate Pythia and GPT-J.

RedPajama-INCITE-Chat-3B-v1 is an open-source chat model constructed with RedPajama-INCITE-Base-3B-v1 and fine-tuned over the OASST1 dataset by Open Assistant and Dolly v2.0 dataset by DataBricks. We equally mix the datasets, and fine-tune for 3 epochs.
Evaluating chat models is a challenging task, and we are in the process of conducting more quantitative evaluation based on human and community feedback, and are excited to share these results soon! Nevertheless, here are some examples comparing the behavior of different chat models. We see that in many examples, RedPajama-INCITE-Chat-3B-v1 has similar quality as Open Assistant as reported in the their paper.
RedPajama 3B chat model responses on example queries from the Open Assistant paper.
And, following are some additional examples comparing RedPajama 3B to the Pythia 2.8B model tuned on OASST1 and Dolly v2.0 datasets.
Preview of RedPajama 7B
The 7B model is still training (at 800B tokens) and we see the training loss still decrease consistently. As a result, we will continue to train it to 1T tokens. Nevertheless, this checkpoint is quite useful, and interesting to build on, and can help the community better understand our training process. Therefore, we are releasing three intermediate checkpoints as a “preview” of the final models.
- RedPajama-INCITE-Base-7B-v0.1 is a base model trained over 800B tokens
- RedPajama-INCITE-Chat-7B-v0.1 is its chat counterpart trained over Dolly 2.0 and Open Assistant
- RedPajama-INCITE-Instruct-7B-v0.1 is instruction tuned for few-shot applications. We follow the recipe for GPT-JT but eliminate all datasets that overlap with the HELM benchmark.
Each of these checkpoints are released under the Apache 2.0 license. Even at 800B tokens, we already see promising results. On HELM, the base model outperforms open models such as GPT-J and Pythia-6.9B by 0.5-2.2 points, and on EleutherAI’s lm-evaluation-harness, it outperforms these models by 1-3 points on average.
We also see that, compared with LLaMA 7B, there is still a quality gap – 4.3 points on HELM at this moment. For few-shot applications (like those in HELM), the instruction-tuned model (RedPajama-INCITE-Instruct-7B-v0.1) improved over the base model significantly. We hope that some of this gap can be closed after we train for more iterations.
(Few Shot) Results on HELM Core Scenarios
The base model also performs well on zero-shot tasks, as measured using EleutherAI’s language model evaluation harness:
(Zero Shot) Results on a subset of lm-evaluation-harness, following LLM Worksheet’s selection of tasks & metrics. We didn’t run coqa because of an error as in this issue. Llama numbers marked with * are taken directly from LLM Worksheet because we run into the following issue.
Results on a subset of llm-evaluation-harness, tasks selected from what used to evaluate Pythia and GPT-J.

Moving Forward: RedPajama v2 with 2T Tokens
We learned a lot from the community and are working on building RedPajama v2 with 2 trillion tokens, by taking a systematic approach:
- We measured the validation loss of different models on different slices of the Pile (for each slice, we selected the first 5K passages). We see that RedPajama lags behind on many slices of the Pile, especially for those slices that are not directly included in the RedPajama dataset. Inspired by this, we plan to mix the Pile dataset into RedPajama and form a more diverse dataset with even more tokens.
- And we need more code! Another immediate to-do on our plate is to mix in data from the Stack and enrich the Github slice of RedPajama, which contains only 59 billion tokens.
With all these improvements together, we are shooting for a 2T token RedPajama v2 dataset. Next week we will start doing a series of runs to understand the right data mixture and start training new models over RedPajama v2.
Acknowledgements
The training of the first collection of RedPajama-INCITE models is performed on 3,072 V100 GPUs provided as part of the INCITE compute grant on Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). This grant was awarded to AAI CERC lab at Université de Montréal, LAION and EleutherAI in fall 2022 for their collaborative project on Scalable Foundation Models for Transferrable Generalist AI.
We are thankful to all the project team members helping to build the RedPajama dataset and supporting training, including Ontocord.ai, ETH DS3Lab, AAI CERC Lab at the Université de Montréal, Stanford Center for Research on Foundation Models (CRFM), Stanford Hazy Research research group, LAION and EleutherAI. We are grateful to Quentin Anthony (EleutherAI and INCITE project team) for sharing the GPT-NeoX model architecture and training code.
We are also appreciative to the work done by the growing open-source AI community that made this project possible. That includes:
- Meta AI — Their inspiring work on LLaMA shows a concrete path towards building strong language models, and it is the original source for our dataset replication.
- EleutherAI — This project is built on the backs of the great team at EleutherAI — including the source code they provided for training GPT-NeoX.
- INCITE project team — Their work on GPT-NeoX adaptation to Summit during early 2023 enabled distributed training that scaled efficiently to thousands of Summit GPUs, and ensured smooth training of the models.
- This research used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. We are grateful for the invaluable support provided to us by the OLCF leadership and by the OLCF liaison for the INCITE project.
LOREM IPSUM
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
LOREM IPSUM
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Value Prop #1
Body copy goes here lorem ipsum dolor sit amet
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
Value Prop #1
Body copy goes here lorem ipsum dolor sit amet
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
Value Prop #1
Body copy goes here lorem ipsum dolor sit amet
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
List Item #1
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
List Item #1
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
List Item #1
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
List Item #2
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
List Item #2
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
List Item #3
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.