Building your own RAG application using Together AI and Langchain
Together AI provides the fastest cloud platform for building and running generative AI. Today we are launching the Together Embeddings endpoint. As part of a series of blog posts about the Together Embeddings endpoint release, we are excited to announce that you can build your own powerful RAG-based application right from the Together platform with Langchain.
What is Retrieval Augmented Generation (RAG)?
Retrieval Augmented Generation (RAG) (original paper, Lewis et al.), leverages both generative models and retrieval models for knowledge-intensive tasks. It improves Generative AI applications by providing up-to-date information and domain-specific data from external data sources during response generation, reducing the risk of hallucinations and significantly improving performance and accuracy.
Building a RAG system can be cost and data efficient without requiring technical expertise to train a model while keeping the other advantages mentioned above. Note that you can still fine-tune an embedding or generative model to improve the quality of your RAG solution even further! Check out Together fine-tuning API to start.
Quickstart
To build RAG, you first need to create a vector store by indexing your source documents using an embedding model of your choice. LangChain libraries provide necessary tools for loading documents, splitting documents to small chunks that fit the context window of the embedding model you select, and storing their embeddings to a vector store. LangChain supports numerous vector stores. See the complete list of supported vector stores here. After you have your vector store ready, you will retrieve relevant data examples to your query using the same embedding model you used to create the vector store and Retriever. Finally, augment the retrieved information to your prompt, and obtain the final output from a generative model.
Below you will find an example of how you can incorporate latest knowledge into your RAG application using the Together API and Langchain so that a generative model can respond with the correct information.
First, install the following packages:
Set the environment variables for the API keys. You can find the Together API key under the settings tab in Together Playground.
Now we will provide some recent information about Together AI and ask the model “What are some recent highlights of Together AI?”
The answer is correct and incorporates the context provided above! For a comparison, if you just use the mistralai/Mixtral-8x7B-Instruct-v0.1 model for the same question, “What are some recent highlights of Together AI?”, the answer is mostly inaccurate and does not include recent information.
Conclusion
The above example demonstrates how to build a RAG (Retrieval-Augmented Generation) system using Together and LangChain. By leveraging the power of these tools, you can create a generative model that provides accurate and up-to-date responses by retrieving relevant data from your vector store.
As you continue to explore the capabilities of Together APIs and LangChain, we encourage you to experiment with different use cases and applications. We are excited to see the innovative solutions that you will build using these powerful tools.
Thank you for following along with this tutorial!
Documentations
- Lower
Cost20% - faster
training4x - network
compression117x
Q: Should I use the RedPajama-V2 Dataset out of the box?
RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.