Together AI and Snorkel AI empower enterprises to build proprietary LLMs
We are proud to announce a new strategic partnership with Snorkel AI to enable organizations to build custom LLMs on their data in their secure environments. This end-to-end AI development solution spans data development, model training, fine-tuning, and deployment.
“Enterprises today are more interested in GPT-You than GPT-4. They want capable, specialist LLMs that are trained on their data and private to them,” said Snorkel AI Co-founder and CEO Alex Ratner. “Accomplishing that requires both data-centric operations—which the Snorkel team has been researching and developing for the better part of a decade—and model-centric ones. Together’s training, fine-tuning, and inference cloud is the perfect complement for Snorkel’s data development platform, and we’ve been impressed by their unique technology.”
While publicly-available LLMs yield impressive results on a wide variety of GenAI tasks, their limitations make them an untenable solution for many businesses. Their lack of domain and use-case specialization puts them out of line with enterprise objectives and leaves accuracy on the table. Using the same model as its competitors also prevents a business from taking advantage of the moat that their proprietary data should provide.
Businesses can overcome this challenge by selecting an open source large language model and fine-tuning their own proprietary version of it. Previously, that meant the slow process of building substantial internal tooling and workflows. Now, firms can work with Snorkel and Together to achieve tangible results with business value, faster.
“With Together API, our training, fine-tuning, and inference cloud, every part of the model development process is fully within your business’s control, and you own the weights for your model in the end,” said Together AI Founder and CEO Vipul Ved Prakash. “But training is only half the equation—selecting, labeling, and curating the right data to train on is critical yet one of the biggest stumbling blocks. Snorkel's track record of innovation and proven enterprise results make them a clear leader, and we’re excited to partner to provide a full-stack solution for AI development.”
Researchers at Snorkel and Together recently collaborated to use the Snorkel Flow data development platform with Together API to create higher quality instruction tuning datasets for the RedPajama family of open-source large language models. More details on that project will follow in the coming days. For more information, visit snorkel.ai to request a demo.
Together AI
Together AI is a research-driven artificial intelligence company. Together AI contributes leading open-source research, models, and datasets to advance the frontier of AI. Its decentralized cloud services empower developers and researchers at organizations of all sizes to train, fine-tune, and deploy generative AI models. Together AI believes open and transparent AI systems will drive innovation and create the best outcomes for society. The company’s seed round was led by Lux Capital. To start fine-tuning and running the world’s best open-source models, go to api.together.ai.
Snorkel AI
Founded by a team spun out of the Stanford AI Lab, Snorkel AI makes AI development fast and practical by transforming manual AI development processes into programmatic and systematic solutions. Snorkel AI enables enterprises to develop AI that works for their unique workloads using their proprietary data and knowledge, 10-100x faster. Backed by Addition, Greylock, GV, In-Q-Tel, Lightspeed Venture Partners, and funds and accounts managed by BlackRock, the company is based in Palo Alto. For more information on Snorkel AI, visit snorkel.ai.
- Lower
Cost20% - faster
training4x - network
compression117x
Q: Should I use the RedPajama-V2 Dataset out of the box?
RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.