Company

Introducing Together Rerank API and exclusive access to Salesforce LlamaRank model for enhanced enterprise search

August 26, 2024

・

Together AI

Today, we're excited to announce two significant advancements for developers building enterprise search and Retrieval Augmented Generation (RAG) systems: our new serverless Together Rerank API, along with exclusive access to LlamaRank, a cutting-edge reranker model developed by Salesforce AI Research. LlamaRank has demonstrated superior performance to other rerank models, including Cohere Rerank v3 and Mistral-7B QLM. This collaboration with Salesforce AI Research brings advanced document ranking capabilities to developers, and empowers them to enhance the accuracy and efficiency of information retrieval in both RAG and traditional search systems.

As the exclusive launch partner for the proprietary LlamaRank model, Together’s Platform now offers developers and businesses the flexibility to build and manage their entire generative AI lifecycle, from training and fine-tuning to inference, using both open and proprietary models. We're thrilled to provide customers with the best performance, accuracy, and cost for their generative AI workloads while allowing them to keep ownership of their models and their data secure.

The highlights include:

New Together Rerank API: A serverless endpoint that enables seamless integration of supported reranker models into your enterprise applications with just a few lines of code. Read the quickstart to get started.
Exclusive access to LlamaRank: A state-of-the-art reranker model from Salesforce AI Research that outperforms leading competitors like Cohere Rerank.
Enhanced search relevance: Rerankers improve the relevance of search results and reduce costs by filtering out irrelevant documents that are passed to LLMs during Retrieval Augmented Generation (RAG).
Long document support: Handles documents up to 8,000 tokens in length.
Support for semi-structured data: Search over semi-structured data such as JSON, email, tables, and code.

What is a reranker model?

A reranker is a specialized model that improves search relevancy by reassessing and reordering a set of documents based on their relevance to a given query. It takes a query and a set of text inputs (called 'documents'), and returns a relevancy score for each document relative to the given query.

Let's consider a technical support scenario for a software company. A user submits the following query: "How do I reset my password in the admin panel?"

The initial search retrieves the following documents from the company's knowledge base:


Document 1: "To change your username in the admin panel, go to 'Settings' and select 'Edit Profile'. You can update your username there."

Document 2: "Password requirements: Must be at least 8 characters long, contain one uppercase letter, one lowercase letter, one number, and one special character."

Document 3: "If you've forgotten your password, click on 'Forgot Password' on the login screen. You'll receive an email with instructions to reset your password."

Document 4: "To reset your password in the admin panel: 1) Log in to the admin panel. 2) Go to 'Account Settings'. 3) Click on 'Change Password'. 4) Enter your current password and your new password twice. 5) Click 'Save Changes'."

Document 5: "Regular system maintenance is scheduled for the first Sunday of each month. The admin panel may be unavailable during this time."

The reranker would process these documents and the query, then return relevancy scores for each document:


Document 4 - Relevance Score: 0.95
"To reset your password in the admin panel: 1) Log in to the admin panel. 2) Go to 'Account Settings'. 3) Click on 'Change Password'. 4) Enter your current password and your new password twice. 5) Click 'Save Changes'."

Document 3 - Relevance Score: 0.75
"If you've forgotten your password, click on 'Forgot Password' on the login screen. You'll receive an email with instructions to reset your password."

Document 2 - Relevance Score: 0.40
"Password requirements: Must be at least 8 characters long, contain one uppercase letter, one lowercase letter, one number, and one special character."

Document 1 - Relevance Score: 0.15
"To change your username in the admin panel, go to 'Settings' and select 'Edit Profile'. You can update your username there."

Document 5 - Relevance Score: 0.05
"Regular system maintenance is scheduled for the first Sunday of each month. The admin panel may be unavailable during this time."

In this example, the reranker correctly identifies Document 4 as the most relevant, providing direct instructions for resetting the password in the admin panel.

How reranking improves search and RAG

A reranker is a critical component in modern search and RAG systems. It acts as a quality filter, reassessing and reordering initially retrieved documents based on their relevance to a given query. In RAG pipelines, the reranking step sits between the initial retrieval step and the final generation phase, enhancing the quality of information fed into language models.

Rerankers significantly boost search accuracy and reduce the likelihood of hallucinations in AI-generated responses by improving result relevance. They're particularly valuable in enterprise settings–where large volumes of data exist in different formats, and ensuring search accuracy is crucial for decision-making.

We've observed two main use cases for rerankers in production:

1. Higher quality search results: Rerankers are added to existing search systems, as a kind of second pass quality filter, to improve the relevancy of results.

For example, in e-commerce or retail it can:

Improve product discovery by reranking based on user preferences, popularity, and relevance to search terms.
Enhance user experience by presenting the most relevant items first, potentially increasing conversion rates.

2. More relevant and efficient enterprise RAG systems: Adding rerankers to RAG systems increases the relevancy of results, as well as reduces costs by minimizing the processing of irrelevant documents – all without adding noticeable latency. This enables enterprises to unlock value from their large quantities of proprietary and semi-structured data across many different teams, like customer support, legal, HR, finance, and more.

Example use cases:

Enterprise RAG applications: Improve information retrieval from diverse internal documents.
Knowledge bases: Enhance search functionality for company wikis and documentation.
Customer support search: Quickly find relevant solutions to customer queries.
Code search: Efficiently locate relevant code snippets or documentation in large codebases.

By leveraging rerankers, organizations can enhance the user experience of their RAG applications through more relevant results, while also cutting costs by reducing the number of documents (and therefore tokens) that need to be passed to language models for generation.

Salesforce LlamaRank: A more accurate enterprise reranker model

The Salesforce AI Research team recently released LlamaRank – a state of the art reranker model that outperforms top models including Cohere Rerank v3 and Mistral-7B QLM in accuracy. LlamaRank excels at ranking both general documents and code, making it useful for many enterprise applications.

LlamaRank is a fine-tuned version of Llama3-8B-Instruct. It was trained using data synthesized from larger Llama-3 models, as well as human-labeled data from Salesforce in-house data analysts. The training dataset included topic-based search, document and news QA, code QA, and other types of enterprise-relevant retrieval data.

The expert data annotation team at Salesforce provided iterative feedback to refine relevant scoring – a technique called Reinforcement Learning from Human Feedback (RLHF)–encoding Salesforce’s expertise in enterprise data and search. During inference, a numeric relevance score is computed based on the predicted token probabilities from the model. Inference is fast because the model only needs to output a single token for each document.

Notably, LlamaRank supports an 8K token document size, allowing for more comprehensive document analysis.

Salesforce evaluated LlamaRank on four public datasets:

SQuAD: A well-established question-answering dataset based on Wikipedia
TriviaQA: A question-answering dataset focusing on trivia-style questions from general web data
Neural Code Search (NCS): A code search dataset curated by Facebook
TrailheadQA: A collection of publicly available Trailhead documents and questions from corresponding quizzes

Model	Avg	SQuAD	TriviaQA	NCS	TrailheadQA
SFR LlamaRank	92.9%	99.3%	92.0%	81.8%	98.6%
Cohere Rerank V3	91.2%	98.6%	92.6%	74.9%	98.6%
Mistral-7B QLM	83.3%	87.3%	88.0%	60.1%	97.7%
Embeddings Only	73.2%	93.2%	88.3%	18.2%	93.2%

‍

Method: For every dataset, the number of returned documents was fixed at 8, while varying the input documents between 64 for general datasets and 256 for code data. To read more about Salesforce AI’s research methodology, read their blog post here.

Key advantages of LlamaRank:

Superior performance in code domain: LlamaRank shows marked improvement for code search compared to other rerankers.
Larger document size: LlamaRank supports an 8K max document size, compared to 4K for Cohere Rerank.
Linear scoring calibration: LlamaRank produces linear & calibrated scores across all (doc, query) pairs, making it easier to interpret relevancy scores.

Together Rerank API

To enable a seamless developer experience for building RAG apps, Together AI is announcing a new Rerank API endpoint. Our Rerank endpoint allows you to seamlessly integrate supported reranker models into your enterprise applications. It takes in a query, and a number of documents, and outputs a relevancy score and ordering index for each document. It can also filter its response to the n most relevant documents.

Today marks the launch of our Rerank API, featuring Salesforce's LlamaRank as our inaugural model. While LlamaRank sets a high bar for performance, our roadmap includes plans to incorporate additional reranker models in the future. Importantly, our Rerank endpoint is compatible with Cohere Rerank, enabling seamless integration and easy experimentation with models like LlamaRank for your RAG applications.

Key features of the Together Rerank API include:

Flagship support for LlamaRank, Salesforce’s reranker model
Support for JSON and tabular data
Long 8K context per document
Low latency for fast search queries
Can be implemented with just a few lines of code
Compatibility with Cohere's Rerank API

Cohere Rerank API compatibility

The Together Rerank endpoint is compatible with Cohere Rerank, so if you’ve already built your applications following the Cohere Rerank API you can easily test the Together Rerank API by updating the API key, model and URL.


import cohere

co = cohere.Client(
    base_url="https://api.together.xyz/v1",
    api_key=TOGETHER_API_KEY,
)
docs = [
    "Carson City is the capital city of the American state of Nevada.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
    "Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
    "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",
]
response = co.rerank(
    model="Salesforce/Llama-Rank-V1",
    query="What is the capital of the United States?",
    documents=docs,
    top_n=3,
)

Searching over semi-structured data (JSON, emails)

Together Rerank supports searching over semi-structured data represented as JSON, allowing users to pass JSON documents directly from sources like Elasticsearch or MongoDB. By specifying ‘rank fields’, users can control which fields the model considers for ranking, as well as the order they should be considered in.

The following example shows a request for an email search:


from together import Together

client = Together()

query = "Which pricing did we get from Oracle?"

documents = [
    {
        "from": "Paul Doe ",
        "to": ["Steve ", "lisa@example.com"],
        "date": "2024-03-27",
        "subject": "Follow-up",
        "text": "We are happy to give you the following pricing for your project.",
    },
    {
        "from": "John McGill ",
        "to": ["Steve "],
        "date": "2024-03-28",
        "subject": "Missing Information",
        "text": "Sorry, but here is the pricing you asked for for the newest line of your models.",
    },
    {
        "from": "John McGill ",
        "to": ["Steve "],
        "date": "2024-02-15",
        "subject": "Commited Pricing Strategy",
        "text": "I know we went back and forth on this during the call but the pricing for now should follow the agreement at hand.",
    },
    {
        "from": "Generic Airline Company",
        "to": ["Steve "],
        "date": "2023-07-25",
        "subject": "Your latest flight travel plans",
        "text": "Thank you for choose to fly Generic Airline Company. Your booking status is confirmed.",
    },
    {
        "from": "Generic SaaS Company",
        "to": ["Steve "],
        "date": "2024-01-26",
        "subject": "How to build generative AI applications using Generic Company Name",
        "text": "Hey Steve! Generative AI is growing so quickly and we know you want to build fast!",
    },
    {
        "from": "Paul Doe ",
        "to": ["Steve ", "lisa@example.com"],
        "date": "2024-04-09",
        "subject": "Price Adjustment",
        "text": "Re: our previous correspondence on 3/27 we'd like to make an amendment on our pricing proposal. We'll have to decrease the expected base price by 5%.",
    },
]

response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query=query,
    documents=documents,
    return_documents=True,
    rank_fields=["from", "to", "date", "subject", "text"],
)

print(response)

Searching over tabular data

Organizations heavily rely on structured data like databases and spreadsheets, but traditional retrieval models struggle to search this information effectively. This limits businesses from fully utilizing their data in RAG systems. Together Rerank addresses this challenge through JSON support, enabling reranking of tabular data. Developers can easily convert tables to JSON using frameworks like pandas, enhancing data-driven insights and information retrieval in RAG systems.

How to get started

To get started, create an API key with Together AI, and follow the steps in our quickstart docs to try Salesforce’s LlamaRank model.

To discuss a production-scale deployment of Together Rerank with LlamaRank for your company, contact our sales team.

‍

Lower
Cost
20%
faster
training
4x
network
compression
117x

Try LlamaRank on Together's Rerank API

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

Try LlamaRank on Together's Rerank API

Get started by reading our quickstart docs.

Read the docs