Customizing RAG experiment settings

Last updated: Feb 21, 2025

When you build a retrieval-augmented generation solution in AutoAI, you can customize experiment settings to tailor your results.

If you run a RAG experiment based on default settings, the AutoAI process selects:

The optimization metric for ranking the RAG pipelines
An embeddings model for encoding input data
The foundation models to try, based on the available list

To exercise more control over the RAG experiment, you can customize the experiment settings. After entering the required experiment definition information, click Experiment settings to customize options before running the experiment. Settings you can review or edit fall into three categories:

Retrieval & generation: choose which metric to use for optimizing the RAG pattern, how to retrieve the data, and the models AutoAI can use for the experiment.
Indexing: choose how the data is broken down, the metric used to measure data relevancy, and which embedding model AutoAI can use for the experiment.
Additional information: review the watsonx.ai Runtime instance and the environment to use for the experiment.

Retrieval and generation settings

View or edit the settings that are used to generate the RAG pipelines.

Optimization metric

Choose a metric to use for optimizing and ranking the RAG pipelines.

Answer faithfulness measures how closely the generated response aligns to the text retrieved from the vector store. A high score indicates the response represents the retrieved text well semantically and syntactically, but it does not indicate it is a correct response.
Answer correctness measures how correct the generated answer is based on the provided benchmark files. This includes the relevance of the retrieved context and the quality of the generated response.
Context correctness measures how relevant the retrieved content is to the original question.

For more information about optimization metrics, see RAG metrics.

Retrieval methods

Choose the method for retrieving relevant data. Retrieval methods differ in the ways that they filter and rank documents.

Window retrieval method divides the indexed documents into windows and surrounds retrieved chunks with additional chunks before and after, based on what was in the original document. This method is useful for getting more context that might be missing in the originally retrieved chunk.
Simple retrieval method retrieves all relevant passages from the index documents and ranks them according to relevancy against the question. The highest-ranked document is presented as the answer.

Window retrieval can be a more efficient choice for queries against a relatively small collection of documents. Simple retrieval can produce more accurate results for queries against a larger collection.

Foundation models to include

By default, all available foundation models that support AutoAI for RAG are selected for training. You can manually edit the list of foundation models that AutoAI can consider for generating the RAG pipelines. For each model, you can click Model details to view or export details about the model, including a description of the intended use. When you run the experiment, the models are sampled and ranked in the pre-processing stage. The top three models are selected and used to develop the patterns.

For the list of available foundation models along with descriptions, see Foundation models by task.

Max RAG patterns to complete

You can specify the number of RAG patterns to complete, up to a maximum of 20. A higher number provides more patterns to compare, but consumes more compute resources.

Indexing settings

View or edit the settings for creating the text vector database from the document collection.

Chunking

Chunking settings determine how indexed documents are broken down into smaller pieces for processing by a foundation model. Chunking data allows a foundation model to process multiple pieces of data in parallel, improving efficiency. Overlapping chunks ensures that context is not lost between chunks.

AutoAI RAG uses the recursive chunking method to break down the documents. In the recursive method, the model applies the retrieval module to every chunk of text to find and retrieve all of the additional available information for each chunk before producing the output. For more information about the recursive chunking method, see Retrieval recursively split by character in the Langchain documentation.

How you chunk data depends on your use case. Smaller chunks provide a more granular interaction with text, useful for identifying keywords, for example, where larger chunks can provide more context, rather than focusing on specific words or phrases. For your chunking use case, specify:

The number of characters to include in each chunk of data.
The number of characters to overlap for chunking data. The number must be smaller than the chunking size.

Embedding models

Embedding models are used in retrieval-augmented generation solutions for encoding text data as vectors to capture the semantic meaning of natural language strings. The vectorized input data can be used to retrieve similar data from the indexed document collection to generate output text. Edit the list of embedding models that AutoAI can consider when the experiment is running.

For a list of embedding models available for use with AutoAI RAG experiments, see Supported encoder models available with watsonx.ai.

Additional information

Review the watsonx.ai Runtime instance used for this experiment and the environment definition.

Learn more

Retrieval-Augmented Generation (RAG)

Parent topic: Creating a RAG experiment