
The RedPajama project aims to create a set of leading open-source models and to rigorously understand the ingredients that yield good performance. A few weeks ago we released the RedPajama base dataset based on the LLaMA paper, which has galvanized the open-source community. The 5 terabyte dataset has been downloaded hundreds of times and used to train models like MPT, OpenLLaMA, OpenAlpaca. Today we are excited to release RedPajama-INCITE models, including instruct-tuned and chat versions.
Today’s release includes our first models trained on the RedPajama base dataset: a 3 billion and a 7B parameter base model that aims to replicate the LLaMA recipe as closely as possible. In addition we are releasing fully open-source instruction-tuned and chat models. Our key takeaways:
- The 3B model is the strongest in its class, and the small size makes it extremely fast and accessible (it even runs on a RTX 2070 released over 5 years ago).
- The instruction-tuned versions of the models achieve strong performance on HELM benchmarks. As expected, on HELM the 7B model performance is higher than the base LLaMA model by 3 points. We recommend using these models for downstream applications with few-shot, entity extraction, classification, or summarization tasks.
- The 7B model (which is 80% done training) is already outperforming the Pythia 7B model, which is showing the importance of a bigger dataset and the value of the RedPajama base dataset.
- Based on our observations, we see a clear path for creating a better version of the RedPajama dataset, which we will release in the coming weeks, that will go beyond the quality of LLaMA 7B. We plan to build models at larger scale with this new dataset.
- We expect differences between the LLaMA 7B and our replication, which we have investigated below.
The biggest takeaway is the demonstration that performant LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s Pythia training code, FlashAttention from Stanford and Together, the HELM benchmarks from Stanford CRFM and generous support from EleutherAI & LAION for compute time on the Summit supercomputer within the INCITE program award "Scalable Foundation Models for Transferable Generalist AI”. We believe these kind of open collaborations, at larger scales, will be behind the best AI systems of the future.
“RedPajama 3B model is the strongest model in it’s class and brings a performant large language model to a wide variety of hardware.”
Today’s release includes the following models, all released under the permissive Apache 2.0 license allowing for use both in research and commercial applications.
In only a few weeks the support, suggestions, and feedback for RedPajama from the open-source community has been incredible. Based on our learnings, we are also already starting the next version of the RedPajama base dataset which will be nearly twice the size of the original v1 dataset. Thank you for your support, feedback and suggestions!

During RedPajama model training we have shared regular updates, and both the 3B and 7B models have now been trained on 800 billion tokens. We are excited to see that the 3B model has stabilized at 800 billion tokens and the 7B model continues to improve as it completes training to 1 trillion tokens.
3B RedPajama Models
RedPajama-INCITE-Base-3B-v1 is trained over the RedPajama v1 dataset, with the same architecture as the popular Pythia model suite. We chose to start with the Pythia architecture to understand the value of training with the much larger RedPajama dataset with respect to the current leading open-source dataset, the Pile. Training on Summit leveraged the DeeperSpeed codebase developed by EleutherAI.
We are excited to see that at 800B tokens, RedPajama-Base-INCITE-3B has better few-shot performance (measured in HELM, as the average score over 16 core scenarios) and better zero-shot performance (measured in Eleuther’s LM evaluation harness) compared with open models of similar size, including the well-regarded GPT-Neo and Pythia-2.8B (trained with 420B and 300B tokens, respectively, with the Pile). On HELM, it outperforms these models by 3-5 points. On a subset of tasks from lm-evaluation-harness, outperforms these open models by 2-7 points.
Additionally, we are excited to release an instruction-tuned version of this 3B model, RedPajama-INCITE-Instruct-3B-v1, trained following Together’s GPT-JT recipe and removing any data in HELM benchmarks to ensure that there is no contamination with respect to HELM. This model shows excellent performance on few-shot tasks, even approaching the quality of LLaMA 7B in a much smaller model, as shown in the results below:
Few Shot Results on HELM Core Scenarios
The base model also performs well on zero-shot tasks, as measured using EleutherAI’s language model evaluation harness:
(Zero Shot) Results on a subset of lm-evaluation-harness, following LLM Worksheet’s selection of tasks & metrics. We didn’t run coqa because of an error as in this issue.
Results on a subset of lm-evaluation-harness, tasks selected from what used to evaluate Pythia and GPT-J.

RedPajama-INCITE-Chat-3B-v1 is an open-source chat model constructed with RedPajama-INCITE-Base-3B-v1 and fine-tuned over the OASST1 dataset by Open Assistant and Dolly v2.0 dataset by DataBricks. We equally mix the datasets, and fine-tune for 3 epochs.
Evaluating chat models is a challenging task, and we are in the process of conducting more quantitative evaluation based on human and community feedback, and are excited to share these results soon! Nevertheless, here are some examples comparing the behavior of different chat models. We see that in many examples, RedPajama-INCITE-Chat-3B-v1 has similar quality as Open Assistant as reported in the their paper.
RedPajama 3B chat model responses on example queries from the Open Assistant paper.
And, following are some additional examples comparing RedPajama 3B to the Pythia 2.8B model tuned on OASST1 and Dolly v2.0 datasets.

Audio Name
Audio Description

Performance & Scale
Body copy goes here lorem ipsum dolor sit amet
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
- Bullet point goes here lorem ipsum
Infrastructure
Best for
List Item #1
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
List Item #1
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Build
Benefits included:
✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.
Funding: Less than $5M
Build
Benefits included:
✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.
Funding: Less than $5M
Build
Benefits included:
✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.
Funding: Less than $5M
Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:
Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$
Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:
Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.
- A. 2.08*1e-1 m
- B. 2.08*1e-9 m
- C. 2.08*1e-6 m
- D. 2.08*1e-3 m
Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:
Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by
- A. releasing nitrogen in the soil.
- B. crowding out non-native species.
- C. adding carbon dioxide to the atmosphere.
- D. removing water from the soil and returning it to the atmosphere.
Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:
Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.
Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:
Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?