Use embedding models to create text embeddings that capture the meaning of a sentence or passage to help with retrieval-augmented generation tasks.
Retrieval-augmented generation (RAG) is a technique in which a foundation model prompt is augmented with knowledge from external sources. You can use text embeddings to find higher-quality relevant information to include with the prompt to help the foundation model answer factually.
The following diagram illustrates the retrieval-augmented generation pattern with embedding support.
The retrieval-augmented generation pattern with embedding support involves the following steps:
- Convert your content into text embeddings and store them in a vector data store.
- Use the same embedding model to convert the user input into text embeddings.
- Run a similarity or semantic search in your knowledge base for content that is related to a user's question.
- Pull the most relevant search results into your prompt as context and add an instruction, such as “Answer the following question by using only information from the following passages.”
- Send the combined prompt text (instruction + search results + question) to the foundation model.
- The foundation model uses contextual information from the prompt to generate a factual answer.
Augmenting foundation model input from Prompt Lab
The Prompt Lab has a built-in function in chat mode that helps you to implement a RAG use case. To start, you associate relevant documents with a prompt. The documents that you add are vectorized and stored in a vector database. When a query is submitted to the chat, the database is searched and related results are included in the input that is submitted to the foundation model. For more information, see Grounding foundation model prompts in contextual information.
Sample notebook
The Use watsonx Granite Model Series, Chroma, and LangChain to answer questions (RAG) sample notebook walks you through the steps to follow to enhance a RAG use case with embeddings.
Learn more
- Supported embedding models
- Retreival-augmented generation
- Vectorizing text by using the API
- Techniques for overcoming context length limitations
- Text embeddings API reference
Parent topic: Retrieval-augmented generation