You can create a generative AI solution that answers user questions based on information in a knowledge base by applying the retrieval-augmented generation (RAG) pattern.
Use case and requirements
The RAG pattern is useful when you need a foundation model to augment answers to questions with the information that you provide. For example, suppose that you want to implement a chatbot for your new product that answers questions based on your product's documentation.
A RAG solution that answers questions based on product documentation might include these tasks:
- Find the answer to the user question in the product documentation.
- Generate answers based on the product documentation.
- Link to the documentation topics that provided the answer.
- Save all questions and answers.
- Collect user feedback about answers.
- Deliver the negative feedback to the writers of the documentation.
Solution architecture
A RAG pattern typically includes the following components:
- A knowledge base that contains the source documents.
- An embedding model that converts the text in the documents into vector embeddings, which are numerical representations.
- A vector database that stores the vector embeddings of your documents and the vector index that searches and retrieves content.
- A prompt template that combines the user question, retrieved content, and instructions for generating an answer.
- A foundation model that generates an answer to a user's question based on the retrieved content.
The completed solution might also include the following components:
- An app with a user interface where users can ask questions, receive answers, and provide feedback on the answers.
- A reranking model that reorders the retrieved results based on how well they answer the question, instead of how similar the results are to the question.
- A log of the questions, answers, and user feedback.
- A method of delivering the feedback to the documentation writers, who can update the documentation so that the solution generates better answers.
- A method to identify the best experts to respond to unsatisfactory answers.
For example, the following graphic illustrates how you can optimize your RAG solution by adapting your content to improve AI answers based on user feedback.
Implementing the solution
To implement a RAG pattern solution, follow these main steps:
- Set up a vector database.
- Create a vector index.
- Specify the vector index in the prompt.
- Evaluate the prompt.
- Add optional components.
- Deploy the pattern as an AI service.
- Call the AI service endpoint in your application.
Tools
You can create a quick proof of concept in the Prompt Lab. You can upload a document file and create a vector index for it in the in-memory vector store. See Chatting with documents. When you set up a vector database, you can create a vector index that you can reference in a prompt. See Adding vectorized documents for grounding foundation model prompts.
You can jump start your RAG solution with the Q&A with RAG accelerator. The accelerator is a sample project that implements a RAG pattern with a set of Python notebooks that you can customize for your solution. See Q&A with RAG accelerator.
You can automate the search for the best RAG pattern, run the AutoAI tool to build a RAG solution. AutoAI automates the end-to-end flow from experimentation to deployment. See Automating a RAG pattern with AutoAI.
You can skip the user interface and write code with REST APIs, Python libraries, or Node.js SDKs. See the watsonx Developer Hub.
Learn more
- Retrieval-augmented generation
- Quick start: Prompt a foundation model with the retrieval-augmented generation pattern
Parent topic: Planning a generative AI solution