Retrieval-augmented generation (RAG) pattern
You can build a retrieval-augmented generation (RAG) pattern to generate factual output that is grounded in information from a knowledge base.
A RAG pattern processes content from a knowledge base into a vector format that is easy to search. When a user asks a question, the RAG pattern retrieves a set of relevant passages and provides them to the LLM. The LLM generates a factual answer to the user's question.
RAG pattern capabilities and resources
You can create a RAG pattern with the following capabilities and resources:
- Vector embeddings
- Create vector embeddings to encode the meaning of a sentence or passage as a numerical representation. Vectors provide an efficient way to find passages of text in your knowledge base that are most similar to the question that the user asks. Vectors are stored in vector databases and retrieved with a vector index.
- See Creating a vector index.
- Text extraction
- Convert content from a PDF or image to text for vectorization.
- See Extracting text from documents.
- Retrieved passage reranking
- Rerank the top set of results that are retrieved by relevance instead of similarity.
- See Reranking document passages.
- Vector stores
- Choose from the in-memory Chroma vector store, the watsonx.data Milvus vector store that is automatically set up, or other vector stores that you create connections to.
- See Types of vector stores.
- Encoder models for embedding and reranking
- Choose from IBM and third-party encoder models for creating embeddings and reranking passages.
- See Supported encoder foundation models.
- Foundation models for inferencing
- Choose from a range of foundation models, select a deploy-on-demand model, or import and deploy a custom model.
- See Foundation models that support your use case.
- Samples to adapt
- Start with a sample RAG pattern and adapt it for your use case.
- See Sample RAG patterns.
Ways to work
You can write code for your RAG pattern with the following methods:
You can work in a no-code or low-code experience with tools in the UI:
- Prompt Lab: You can chat with an uploaded document or with a vector index.
- AutoAI for RAG: You can automate the search for an optimized, production-quality RAG pattern.
- Vector index: You can create a vector index based on one or more documents.
The retrieval-augmented generation pattern architecture
You can scale out the technique of including context in your prompts by using information from a knowledge base.
This video provides a visual method to learn the concepts and tasks in this documentation.
Video chapters
[ 0:08 ] Scenario description
[ 0:27 ] Overview of pattern
[ 1:03 ] Knowledge base
[ 1:22 ] Search component
[ 1:41 ] Prompt augmented with context
[ 2:13 ] Generating output
[ 2:31 ] Full solution
[ 2:55 ] Considerations
for search
[ 3:58 ] Considerations for prompt text
[ 5:01 ] Considerations for explainability
The following diagram illustrates the retrieval-augmented generation pattern at run time. Although the diagram shows a question-answering example, the same workflow supports other use cases.
The retrieval-augmented generation pattern involves the following steps:
- Your knowledge base is preprocessed to convert the content to plain text and vectorize it. Preprocessing can include text extraction to convert information in tables and images into text that the LLM can interpret.
- A user asks a question.
- The question is converted to a text embedding.
- The vector store that contains the vectorized content of the knowledge base is searched for content that is similar to the user's question.
- The most relevant search results are added to the prompt in text format, along with an instruction, such as “Answer the following question by using only information from the following passages.”
- The combined prompt text (instruction + search results + question) is sent to the foundation model.
- The foundation model uses contextual information from the prompt to generate a factual answer.
Knowledge base
Your knowledge base can be any collection of information-containing artifacts, such as:
- Process information in internal company wiki pages
- Files in GitHub
- Messages in a collaboration tool
- Topics in product documentation, which can include long text blocks
- Text passages in a database that supports structured query language (SQL) queries, such as Db2
- A document store with a collection of files, such as legal contracts that are stored as PDF files
- Customer support tickets in a content management system
Most supported vector stores and AutoAI support files of type PDF, HTML, DOCX, MD, or plain text. See Grounding document file types.
Content preprocessing
When you set up your RAG pattern, you preprocess the documents in your knowledge base. Preprocessing first converts the content to plain text. You can configure text extraction to convert information in tables and images into text that the LLM can interpret. Then, the embedding model vectorizes the text, stores it in the vector database, and creates a vector index for retrieving the content.
When a user asks a question, that text is vectorized by the embedding model.
Content retrieval and ranking
The retriever searches for the content in the vector database that is most similar to the vector embedding of the query text. The retrieved passages are ranked by similarity to the question. You can add a reranking model to your RAG pattern to evaluate those top retrieved passages for relevance to answering the question.
Answer generation
The retrieved passage, the user's question, and the instructions are sent to the foundation model in a prompt. The foundation model generates an answer and returns it to the user.
Sample RAG patterns
The following samples demonstrate how to apply the retrieval-augmented generation pattern.
Example | Format | Description | Link |
---|---|---|---|
Complete RAG solution | Project | This sample project with notebooks and other assets that implement a question and answer solution by using retrieval-augmented generation. See Q&A with RAG Accelerator. | Q&A with RAG Accelerator sample project. |
Simple introduction | Notebook | Uses a small knowledge base and a simple search component to demonstrate the basic pattern. | Introduction to retrieval-augmented generation |
Simple introduction with Discovery | Notebook | This sample notebook uses short articles in IBM Watson Discovery as a knowledge base and the Discovery API to perform search queries. | Simple introduction to retrieval-augmented generation with watsonx.ai and Discovery |
Example with LangChain | Notebook | Contains the steps and code to demonstrate support of retrieval-augmented generation with LangChain in watsonx.ai. It introduces commands for data retrieval, knowledge base building and querying, and model testing. | Use watsonx and LangChain to answer questions by using RAG |
Example with LangChain and an Elasticsearch vector database | Notebook | Demonstrates how to use LangChain to apply an embedding model to documents in an Elasticsearch vector database. The notebook then indexes and uses the data store to generate answers to incoming questions. | Use watsonx, Elasticsearch, and LangChain to answer questions (RAG) |
Example with the Elasticsearch Python library | Notebook | Demonstrates how to use the Elasticsearch Python library to apply an embedding model to documents in an Elasticsearch vector database. The notebook then indexes and uses the data store to generate answers to incoming questions. | Use watsonx, and Elasticsearch Python library to answer questions (RAG) |
Example with LangChain and a SingleStoreDB database | Notebook | Shows you how to apply retrieval-augmented generation to large language models in watsonx by using the SingleStoreDB database. | RAG with SingleStoreDB and watsonx |
Learn more
Try these tutorials:
- Prompt a foundation model by using Prompt Lab
- Prompt a foundation model with the retrieval-augmented generation pattern
Parent topic: Generative AI solutions