Retrieval-augmented generation

Last updated: Oct 09, 2024

You can use foundation models in IBM watsonx.ai to generate factually accurate output grounded in information in a knowledge base by applying the retrieval-augmented generation pattern.

This video provides a visual method to learn the concepts and tasks in this documentation.

Video chapters
[ 0:08 ] Scenario description
[ 0:27 ] Overview of pattern
[ 1:03 ] Knowledge base
[ 1:22 ] Search component
[ 1:41 ] Prompt augmented with context
[ 2:13 ] Generating output
[ 2:31 ] Full solution
[ 2:55 ] Considerations for search
[ 3:58 ] Considerations for prompt text
[ 5:01 ] Considerations for explainability

Providing context in your prompt improves accuracy

Foundation models can generate output that is factually inaccurate for a variety of reasons. One way to improve the accuracy of generated output is to provide the needed facts as context in your prompt text.

Example

The following prompt includes context to establish some facts:

Aisha recently painted the kitchen yellow, which is her favorite color.

Aisha's favorite color is

Unless Aisha is a very famous person who's favorite color has been mentioned in many online articles included in common pre-training data sets, without the context at the beginning of the prompt, no foundation model could reliably generate the correct completion of the sentence at the end of the prompt.

If you prompt a model with text that includes fact-filled context, then the output the model generates is more likely to be accurate. For more details, see: Generating factually accurate output

The retrieval-augmented generation pattern

You can scale out the technique of including context in your prompts by leveraging information in a knowledge base.

The retrieval-augmented generation pattern involves three basic steps:

Search for relevant content in your knowledge base
Pull the most relevant content into your prompt as context
Send the combined prompt text to the model to generate output

The origin of retrieval-augmented generation

The term retrieval-augmented generation (RAG) was introduced in this paper: Retrieval-augmented generation for knowledge-intensive NLP tasks

"We build RAG models where the parametric memory is a pre-trained seq2seq transformer, and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever."

In that paper, the term "RAG models" refers to a specific implementation of a retriever (a specific query encoder and vector-based document search index) and a generator (a specific pre-trained, generative language model.) However, the basic search-and-generate approach can be generalized to use different retriever components and foundation models.

Knowledge base

The knowledge base could be any collection of information-containing artifacts, such as:

Process information in internal company wiki pages
Files in GitHub (in any format: Markdown, plain text, JSON, code)
Messages in a collaboration tool
Topics in product documentation
Text passages in a database like Db2
A collection of legal contracts in PDF files
Customer support tickets in a content management system

Retriever

The retriever could be any combination of search and content tools that reliably returns relevant content from the knowledge base:

Search tools like IBM Watson Discovery
Search and content APIs (GitHub has APIs like this, for example)
Vector databases (such as chromadb)

Generator

The generator component could use any model in watsonx.ai, whichever one suits your use case, prompt format, and content you are pulling in for context.

Examples

The following examples demonstrate applying the retrieval-augmented generation pattern.

Table 1. Retrieval-augmented generation examples
Example	Description	Link
Simple introduction	This sample notebook uses a small knowledge base and a simple search component to demonstrate the basic pattern	Introduction to retrieval-augmented generation
Real world example	The watsonx.ai documentation has a search-and-answer feature that can answer basic what-is questions using the topics in the documentation as a knowledge base	Answering watsonx.ai questions using a foundation model

Parent topic: Foundation models