Q&A with RAG Accelerator

Last updated: Feb 21, 2025

Try the Q&A with RAG Accelerator to set up retrieval-augmented generation (RAG) to generate factually accurate output that is grounded in information from provided documents.

Log into the Resource hub and then create the Q&A with RAG Accelerator sample project.

The Q&A with RAG Accelerator provides an advanced RAG pattern and implementation that includes the following processes:

Document processing: The conversion, processing, and indexing of documents to generate a vector index.
Answer generation: Question answering by retrieval augmented generation based on vector search results.
Input/output logging: The logging of questions, retrieved chunks and metadata, and answers in a secondary log index.
User feedback collection: User feedback is appended to the matching input/output log entries.
Content analysis: A report of the specific content that needs to be enhanced to improve answers that received negative user feedback.
Human intervention: The identification of the best experts to respond to unsatisfactory answers by using a vector index with expert profiles.

The following graphic shows how the processes of the Q&A with RAG Accelerator are cyclical so that the document content is updated based on user feedback.

Described in the surrounding text

Document processing

A notebook for document processing and indexing automates document conversion, splitting, and indexing in a vector index in one of these vector databases:

watsonx.data Milvus
watsonx Discovery with Elastic Search Enterprise or IBM Cloud Databases for Elasticsearch Platinum.

The example document collection to vectorize is a version of this documentation watsonx as a Service set in a ZIP file. You can customize the notebook to run existing vector indexes that you created outside of watsonx with other tools, for example, Elastic connectors and pipelines, Spark pipelines, or your own processes.

Answer generation

The notebook for the Q&A Python function defines the Python function code and automates its deployment with a well-defined URL suffix in a deployment space. The Q&A with RAG Python function code is configured by parameter sets. The function takes a question as input and queries the vector index to retrieve the most relevant chunks for answering the question and the chunk metadata, including links to source documents. The function adds the most relevant chunks to the configured prompt template and returns the generated answer, the calculated faithfulness score, and the retrieved chunks with metadata.

Input/output logging

For each call of the Q&A with RAG Python function, you can enable logging of the prompt input and output text. Any PII is stripped from the strings before logging. The log index is separate from the vector index for the documents in the vector database.

If input/output logging is active, you can enable type-ahead question completion suggestions when users type their questions. If the user accepts a completion based on a recently answered question, the answer is retrieved from the log index, which saves time and GPU inference and retrieval costs.

User feedback collection

User feedback helps stakeholders understand how well the solution works for their users, which document topics users are interested in, and how well the solution answers questions based on the content. You can configure the application to call the Q&A with RAG Python function again to collect any user feedback on the answer. The user feedback of a satisfaction score and an optional comment is then appended to the Q&A log record for subsequent analysis.

Content analysis

You can configure the user feedback analytics notebook and run it directly or as a job. The notebook queries the log data for the specified time interval or start and end dates. The notebook loads the log data into a dataframe and uses unsupervised topic detection from BERTopic or Watson Natural Language Processing to determine which document topics were retrieved most frequently to generate answers. The notebook analyzes and visualizes user satisfaction by topic, and includes the questions, answers, and user feedback comments for low-rated answers. Based on these insights, stakeholders and knowledge content owners can drive content improvements that result in better answers.

Human intervention

When a user is not satisfied with an answer, you can configure the application to retrieve an expert contact who can provide a better answer. You can configure the expert profiling notebook to process expert profile documents and build an index based on that information. For example, the application can route the question to the expert, send the expert's answer to the user, and alert the knowledge base owner to a possible content enhancement.

Learn more

Retrieval-augmented generation

Parent topic: AI solution accelerators