0 / 0

Retrieval-augmented generation (RAG) pattern

Last updated: May 01, 2025
Retrieval-augmented generation (RAG) pattern

You can build a retrieval-augmented generation (RAG) pattern to generate factual output that is grounded in information from a knowledge base.

A RAG pattern processes content from a knowledge base into a vector format that is easy to search. When a user asks a question, the RAG pattern retrieves a set of relevant passages and provides them to the LLM. The LLM generates a factual answer to the user's question.

RAG pattern capabilities and resources

You can create a RAG pattern with the following capabilities and resources:

Vector embeddings
Create vector embeddings to encode the meaning of a sentence or passage as a numerical representation. Vectors provide an efficient way to find passages of text in your knowledge base that are most similar to the question that the user asks. Vectors are stored in vector databases and retrieved with a vector index.
See Creating a vector index.
Text extraction
Convert content from a PDF or image to text for vectorization.
See Extracting text from documents.
Retrieved passage reranking
Rerank the top set of results that are retrieved by relevance instead of similarity.
See Reranking document passages.
Vector stores
Choose from the in-memory Chroma vector store, the watsonx.data Milvus vector store that is automatically set up, or other vector stores that you create connections to.
See Types of vector stores.
Encoder models for embedding and reranking
Choose from IBM and third-party encoder models for creating embeddings and reranking passages.
See Supported encoder foundation models.
Foundation models for inferencing
Choose from a range of foundation models, select a deploy-on-demand model, or import and deploy a custom model.
See Foundation models that support your use case.
Samples to adapt
Start with a sample RAG pattern and adapt it for your use case.
See Sample RAG patterns.

Ways to work

You can write code for your RAG pattern with the following methods:

You can work in a no-code or low-code experience with tools in the UI:

  • Prompt Lab: You can chat with an uploaded document or with a vector index.
  • AutoAI for RAG: You can automate the search for an optimized, production-quality RAG pattern.
  • Vector index: You can create a vector index based on one or more documents.

The retrieval-augmented generation pattern architecture

You can scale out the technique of including context in your prompts by using information from a knowledge base.

 

This video provides a visual method to learn the concepts and tasks in this documentation.

Video chapters

[ 0:08 ] Scenario description
[ 0:27 ] Overview of pattern
[ 1:03 ] Knowledge base
[ 1:22 ] Search component
[ 1:41 ] Prompt augmented with context
[ 2:13 ] Generating output
[ 2:31 ] Full solution
[ 2:55 ] Considerations for search
[ 3:58 ] Considerations for prompt text
[ 5:01 ] Considerations for explainability

 

The following diagram illustrates the retrieval-augmented generation pattern at run time. Although the diagram shows a question-answering example, the same workflow supports other use cases.

Diagram that shows adding search results that are derived from a vector store to the input for retrieval-augmented generation

The retrieval-augmented generation pattern involves the following steps:

  1. Your knowledge base is preprocessed to convert the content to plain text and vectorize it. Preprocessing can include text extraction to convert information in tables and images into text that the LLM can interpret.
  2. A user asks a question.
  3. The question is converted to a text embedding.
  4. The vector store that contains the vectorized content of the knowledge base is searched for content that is similar to the user's question.
  5. The most relevant search results are added to the prompt in text format, along with an instruction, such as “Answer the following question by using only information from the following passages.”
  6. The combined prompt text (instruction + search results + question) is sent to the foundation model.
  7. The foundation model uses contextual information from the prompt to generate a factual answer.

Knowledge base

Your knowledge base can be any collection of information-containing artifacts, such as:

  • Process information in internal company wiki pages
  • Files in GitHub
  • Messages in a collaboration tool
  • Topics in product documentation, which can include long text blocks
  • Text passages in a database that supports structured query language (SQL) queries, such as Db2
  • A document store with a collection of files, such as legal contracts that are stored as PDF files
  • Customer support tickets in a content management system

Most supported vector stores and AutoAI support files of type PDF, HTML, DOCX, MD, or plain text. See Grounding document file types.

Content preprocessing

When you set up your RAG pattern, you preprocess the documents in your knowledge base. Preprocessing first converts the content to plain text. You can configure text extraction to convert information in tables and images into text that the LLM can interpret. Then, the embedding model vectorizes the text, stores it in the vector database, and creates a vector index for retrieving the content.

When a user asks a question, that text is vectorized by the embedding model.

Content retrieval and ranking

The retriever searches for the content in the vector database that is most similar to the vector embedding of the query text. The retrieved passages are ranked by similarity to the question. You can add a reranking model to your RAG pattern to evaluate those top retrieved passages for relevance to answering the question.

Answer generation

The retrieved passage, the user's question, and the instructions are sent to the foundation model in a prompt. The foundation model generates an answer and returns it to the user.

Sample RAG patterns

The following samples demonstrate how to apply the retrieval-augmented generation pattern.

Retrieval-augmented generation examples
Example Format Description Link
Complete RAG solution Project This sample project with notebooks and other assets that implement a question and answer solution by using retrieval-augmented generation. See Q&A with RAG Accelerator. Q&A with RAG Accelerator sample project.
Simple introduction Notebook Uses a small knowledge base and a simple search component to demonstrate the basic pattern. Introduction to retrieval-augmented generation
Simple introduction with Discovery Notebook This sample notebook uses short articles in IBM Watson Discovery as a knowledge base and the Discovery API to perform search queries. Simple introduction to retrieval-augmented generation with watsonx.ai and Discovery
Example with LangChain Notebook Contains the steps and code to demonstrate support of retrieval-augmented generation with LangChain in watsonx.ai. It introduces commands for data retrieval, knowledge base building and querying, and model testing. Use watsonx and LangChain to answer questions by using RAG
Example with LangChain and an Elasticsearch vector database Notebook Demonstrates how to use LangChain to apply an embedding model to documents in an Elasticsearch vector database. The notebook then indexes and uses the data store to generate answers to incoming questions. Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)
Example with the Elasticsearch Python library Notebook Demonstrates how to use the Elasticsearch Python library to apply an embedding model to documents in an Elasticsearch vector database. The notebook then indexes and uses the data store to generate answers to incoming questions. Use watsonx, and Elasticsearch Python library to answer questions (RAG)
Example with LangChain and a SingleStoreDB database Notebook Shows you how to apply retrieval-augmented generation to large language models in watsonx by using the SingleStoreDB database. RAG with SingleStoreDB and watsonx

Learn more

Try these tutorials:

Parent topic: Generative AI solutions