When you build a retrieval-augmented generation solution in AutoAI, you can customize experiment settings to tailor your results.
If you run a RAG experiment based on default settings, the AutoAI process selects:
- The optimization metric for ranking the RAG pipelines
- An embeddings model for encoding input data
- The foundation models to try, based on the available list
To exercise more control over the RAG experiment, you can customize the experiment settings. After entering the required experiment definition information, click Experiment settings to customize options before running the experiment. Settings you can review or edit fall into three categories:
- Retrieval & generation: choose which metric to use for optimizing the RAG pattern, how to retrieve the data, and the models AutoAI can use for the experiment.
- Indexing: choose how the data is broken down, the metric used to measure data relevancy, and which embedding model AutoAI can use for the experiment.
- Additional information: review the watsonx.ai Runtime instance and the environment to use for the experiment.
Retrieval and generation settings
View or edit the settings that are used to generate the RAG pipelines.
Optimization metric
Choose a metric to use for optimizing and ranking the RAG pipelines.
- Answer faithfulness measures the accuracy of the generated response is to the retrieved text, including how closely it aligns semantically and syntactically.
- Answer correctness measures the correctness of the generated answer including both the relevance of the retrieved context and the quality of the generated response.
- Context correctness measures the relevancy of the retrieved content to the original question.
Retrieval methods
Choose the method for retrieving relevant data. Retrieval methods differ in the ways that they filter and rank documents.
- Window retrieval method divides the indexed documents into windows, or chunks, and add content before and after the retrieved chunk, based on what was in the original document.
- Simple retrieval method retrieves all relevant passages from the index documents and ranks them according to relevancy against the question. The highest-ranked document is presented as the answer.
Window retrieval can be a more efficient choice for queries against a relatively small collection of documents. Simple retrieval can produce more accurate results for queries against a larger collection.
Foundation models to include
Edit the list of foundation models that AutoAI can consider for generating the RAG pipelines. For each model, you can click Model details to view or export details about the model, including a description of the intended use.
For the list of available foundation models along with descriptions, see Foundation models by task.
Max RAG patterns to complete
You can specify the number of RAG patterns to complete, up to a maximum of 20. A higher number provides more patterns to compare, but consumes more compute resources.
Indexing settings
View or edit the settings for creating the text vector database from the document collection.
Chunking
Chunking settings determine how indexed documents are broken down into smaller pieces for processing by a foundation model. Chunking data allows a foundation model to process multiple pieces of data in parallel, improving efficiency. Overlapping chunks ensures that context is not lost between chunks.
How you chunk data depends on your use case. Smaller chunks provide a more granular interaction with text, useful for identifying keywords, for example, where larger chunks can provide more context, rather than focusing on spcific words or phrases. For your chunking use case, specify:
- The number of characters to include in each chunk of data.
- The number of characters to overlap for chunking data. The number must be smaller than the chunking size.
Embedding models
Embedding models are used in retrieval-augmented generation solutions for encoding text data as vectors to capture the semantic meaning of natural language strings. The vectorized input data can be used to retrieve similar data from the indexed document collection to generate output text. Edit the list of embedding models that AutoAI can consider when the experiment is running.
For a list of embedding models available for use with AutoAI RAG experiments, see Supported embedding models available with watsonx.ai.
Additional information
Review the watsonx.ai Runtime instance used for this experiment and the environment definition.
Learn more
Retrieval-Augmented Generation (RAG)
Parent topic: Creating a RAG experiment