You can use foundation models in IBM watsonx.ai to generate factually accurate output, that is grounded in information in a knowledge base, by applying the retrieval-augmented generation (RAG) pattern.
This video provides a visual method to learn the concepts and tasks in this documentation.
Video chapters
Copy link to section
[ 0:08 ] Scenario description [ 0:27 ] Overview of pattern [ 1:03 ] Knowledge base [ 1:22 ] Search component [ 1:41 ] Prompt augmented with context [ 2:13 ] Generating output [ 2:31 ] Full solution [ 2:55 ] Considerations
for search [ 3:58 ] Considerations for prompt text [ 5:01 ] Considerations for explainability
Providing context in your prompt improves accuracy
Copy link to section
Foundation models can generate output that is factually inaccurate for various reasons. One way to improve the accuracy of generated output is to provide the necessary facts as context in your prompt text.
Example
Copy link to section
The following prompt includes context to establish some facts:
Aisha recently painted the kitchen yellow, which is her favorite color.
Aisha's favorite color is
Copy to clipboardCopied to clipboard
Unless Aisha is a famous person whose favorite color was mentioned in many online articles that are included in common pretraining data sets, without the context at the beginning of the prompt, no foundation model can reliably generate the
correct completion of the sentence at the end of the prompt.
If you prompt a model with text that includes fact-filled context, then the output the model generates is more likely to be accurate. For more details, see Generating factually accurate output.
The retrieval-augmented generation pattern
Copy link to section
You can scale out the technique of including context in your prompts by using information in a knowledge base.
The following diagram illustrates the retrieval-augmented generation pattern. Although the diagram shows a question-answering example, the same workflow supports other use cases.
The retrieval-augmented generation pattern involves the following steps:
Search in your knowledge base for content that is related to a user's question.
Pull the most relevant search results into your prompt as context and add an instruction, such as “Answer the following question by using only information from the following passages.”
Only if the foundation model that you're using is not instruction-tuned: Add a few examples that demonstrate the expected input and output format.
Send the combined prompt text (instruction + search results + question) to the foundation model.
The foundation model uses contextual information from the prompt to generate a factual answer.
“We build RAG models where the parametric memory is a pre-trained seq2seq transformer, and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever.”
In that paper, the term RAG models refers to a specific implementation of a retriever (a specific query encoder and vector-based document search index) and a generator (a specific pre-trained, generative language
model). However, the basic search-and-generate approach can be generalized to use different retriever components and foundation models.
Knowledge base
Copy link to section
The knowledge base can be any collection of information-containing artifacts, such as:
Process information in internal company wiki pages
Files in GitHub (in any format: Markdown, plain text, JSON, code)
Messages in a collaboration tool
Topics in product documentation, which can include long text blocks
Text passages in a database that supports structured query language (SQL) queries, such as Db2
A document store with a collection of files, such as legal contracts that are stored as PDF files
Customer support tickets in a content management system
Retriever
Copy link to section
The retriever can be any combination of search and content tools that reliably returns relevant content from the knowledge base, including search tools like IBM Watson Discovery or search and content APIs like those provided by GitHub.
Vector databases are also effective retrievers. A vector database stores not only the data, but also a vector embedding of the data, which is a numerical representation of the data that captures its semantic meaning. At query time, a vector
embedding of the query text is used to find relevant matches.
IBM watsonx.ai does not include a vector database, but you can use the foundation models in watsonx.ai with any vector database on the market. The example notebooks illustrate the steps for connecting to popular vector
databases, such as Cloud Pak for Data and Elasticsearch.
To help you implement a RAG pattern in which the retriever uses vectorized text, watsonx.ai offers an embedding API and embedding models that you can use to convert sentences and passages into vectors. For more information about this type
of RAG implementation, see Using vectorized text with retrieval-augmented generation tasks.
Generator
Copy link to section
The generator component can use any model in watsonx.ai, whichever one suits your use case, prompt format, and content you are pulling in for context.
Sample project
Copy link to section
Import a sample project with notebooks and other assets that implement a question and answer solution by using retrieval-augmented generation. The project shows you how to do the following things:
Use HTML, PDF, DOC, or PPT files as the knowledge base and an Elasticsearch vector index as the retriever. (You must create the Elasticsearch service instance separately.)
Write a Python function that queries the vector index to search for information related to a question, and then inferences a foundation model and checks the generated answer for hallucinated content.
Use prompt templates that help you format effective prompts for foundation models.
Follow the pattern efficiently with RAG utilities from the watsonx.ai Python library.
Implement the next phase of a RAG implementation by including functions for collecting and analyzing user feedback about generated answers.
The watsonx.ai documentation has a search-and-answer feature that can answer basic what-is questions by using the topics in the documentation as a knowledge base.
Contains the steps and code to demonstrate support of retrieval-augmented generation with LangChain in watsonx.ai. It introduces commands for data retrieval, knowledge base building and querying, and model
testing.
Example with LangChain and an Elasticsearch vector database
Demonstrates how to use LangChain to apply an embedding model to documents in an Elasticsearch vector database. The notebook then indexes and uses the data store to generate answers to incoming questions.
Demonstrates how to use the Elasticsearch Python library to apply an embedding model to documents in an Elasticsearch vector database. The notebook then indexes and uses the data store to generate answers to incoming questions.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.