When you build a retrieval-augmented generation solution in AutoAI, you can customize experiment settings to tailor your results.
If you run a RAG experiment based on default settings, the AutoAI process selects:
The optimization metric for ranking the RAG pipelines
An embeddings model for encoding input data
The foundation models to try, based on the available list
To exercise more control over the RAG experiment, you can customize the experiment settings. After entering the required experiment definition information, click Experiment settings to customize options before running the experiment.
Settings you can review or edit fall into three categories:
Retrieval & generation: choose which metric to use for optimizing the RAG pattern, how to retrieve the data, and the models AutoAI can use for the experiment.
Indexing: choose how the data is broken down, the metric used to measure data relevancy, and which embedding model AutoAI can use for the experiment.
Additional information: review the watsonx.ai Runtime instance and the environment to use for the experiment.
Retrieval and generation settings
Copy link to section
View or edit the settings that are used to generate the RAG pipelines.
Optimization metric
Copy link to section
Choose a metric to use for optimizing and ranking the RAG pipelines.
Answer faithfulness measures how closely the generated response aligns to the text retrieved from the vector store. A high score indicates the response represents the retrieved text well semantically and syntactically, but
it does not indicate it is a correct response.
Answer correctness measures how correct the generated answer is based on the provided benchmark files. This includes the relevance of the retrieved context and the quality of the generated response.
Context correctness measures how relevant the retrieved content is to the original question.
For more information about optimization metrics, see RAG metrics.
Retrieval methods
Copy link to section
Choose the method for retrieving relevant data. Retrieval methods differ in the ways that they filter and rank documents.
Window retrieval method divides the indexed documents into windows and surrounds retrieved chunks with additional chunks before and after, based on what was in the original document. This method is useful for getting more
context that might be missing in the originally retrieved chunk.
Simple retrieval method retrieves all relevant passages from the index documents and ranks them according to relevancy against the question. The highest-ranked document is presented as the answer.
Window retrieval can be a more efficient choice for queries against a relatively small collection of documents. Simple retrieval can produce more accurate results for queries against a larger collection.
Foundation models to include
Copy link to section
You can manually edit the list of foundation models that AutoAI can consider for generating the RAG pipelines. For each model, you can click Model details to view or export details about the model, including a description
of the intended use.
You can specify the number of RAG patterns to complete, up to a maximum of 20. A higher number provides more patterns to compare, but consumes more compute resources.
Indexing settings
Copy link to section
View or edit the settings for creating the text vector database from the document collection.
Chunking
Copy link to section
Chunking settings determine how indexed documents are broken down into smaller pieces for processing by a foundation model. Chunking data allows a foundation model to process multiple pieces of data in parallel, improving efficiency. Overlapping
chunks ensures that context is not lost between chunks.
AutoAI RAG uses the recursive chunking method to break down the documents. In the recursive method, the model applies the retrieval module to every chunk of text to find and retrieve all of the additional available information for each chunk
before producing the output. For more information about the recursive chunking method, see Retrieval recursively split by character in the Langchain documentation.
How you chunk data depends on your use case. Smaller chunks provide a more granular interaction with text, useful for identifying keywords, for example, where larger chunks can provide more context, rather than focusing on specific words or
phrases. For your chunking use case, specify:
The number of characters to include in each chunk of data.
The number of characters to overlap for chunking data. The number must be smaller than the chunking size.
Embedding models
Copy link to section
Embedding models are used in retrieval-augmented generation solutions for encoding text data as vectors to capture the semantic meaning of natural language strings. The vectorized input data can be used to retrieve similar data from the indexed
document collection to generate output text. Edit the list of embedding models that AutoAI can consider when the experiment is running.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.