0 / 0
Automating a RAG pattern with the AutoAI SDK(Beta)

Automating a RAG pattern with the AutoAI SDK(Beta)

Use the AutoAI Python SDK to automate and accelerate the design and deployment of an optimized, production-quality, Retrieval-augmented generation (RAG) pattern based on your data and use-case.

Important: This feature is a beta release. While this feature is in beta, there is no charge for running the experiment, and no tokens are consumed. However, calls to RAG patterns and their derivatives done after the experiment completes consume resources at the standard rate.

Providing accurate answers with Retrieval-augmented generation

Retrieval-augmented generation (RAG) combines the generative power of a large language model with the accuracy of a collection of grounding documents. Interaction with a RAG application follows this pattern:

  1. A user submits a question to the app.
  2. The search first retrieves relevant context from a set of grounding documents.
  3. The accompanying large language model generates an answer that includes the relevant information.

For example, the sample notebooks that are provided for this feature uses the documentation for the watsonx.ai Python client library as the grounding documents for a Q&A app about coding watsonx.ai solutions. Users get the benefit of specific, relevant information from the documentation, with the generative AI model adding context and presenting the answer in natural language.

For a complete description and examples, see Retrieval-augmented generation (RAG).

Automating the search for the best RAG configuration

RAG comes with many configuration parameters, including which large language model to choose, how to chunk the grounding documents, and how many documents to retrieve. Configuration choices that work well for another use case might not be the best choice for your data. To create the best possible RAG pattern for your dataset, you might explore all the possible combinations of RAG configuration options to find, evaluate, and deploy the best solution. This part of the process can require a significant investment of time and resources. Just as you can use AutoAI to rapidly train and optimize machine learning models, you can use AutoAI capabilities to automate the search for the optimal RAG solution based on your data and use case. Accelerating the experimentation can dramatically reduce the time to production.

Key features of the AutoAI approach include:

  • Full exploration and evaluation of a constrained set of configuration options​.
  • ​Rapidly reevaluate and modify the configuration when something changes​. For example, you can easily re-run the training process when a new model is available or when evaluation results signal a change in the quality of responses.

Using AutoAI automates the end-to-end flow from experimentation to deployment. The following diagram illustrates the AutoAI approach to finding an optimized RAG pattern for your data and use case in 3 layers:

  • At the base level are parameterized RAG pipelines used to populate a vector store (index) and to retrieve data from the vector store to use when generating responses.
  • Next, RAG evaluation metrics and benchmarking tools evaluate response quality.
  • Finally, a hyper-parameter optimization algorithm searches for the best possible RAG configuration for your data.

Automating a RAG pattern with the AutoAI SDK

Exploring the sample notebooks

Use the sample notebooks to learn how to use the watsonx.ai Python client library (version 1.1.11 or later) to code an automated RAG solution for your use case.

Example Description
Automating RAG pattern with Chroma database This notebook shows the fast path approach to creating a RAG pattern.
- Uses the watsonx.ai Python SDK documentation files as the grounding documents for a RAG pattern.
- Stores the vectorized content in the default, in-memory Chroma database
Automating RAG pattern with Milvus database - Uses the watsonx.ai Python SDK documentation files as the grounding documents for a RAG pattern.
- Stores the vectorized content in an external Milvus database

Notes on storing vectorized content:

  • The Chroma notebook provides a fast path approach to automating a RAG solution. If you don't specify a connection to a vector store, the vectorized content is saved to the default, in-memory Chroma database. The content does not persist beyond the experiment, so the Chroma option is not a viable production method for deploying a RAG pattern.
  • For a more durable solution, set up a vector database with Milvus. The vectorized content persists for future patterns. For details, see Working with Milvus.

If you are using Milvus, familiarize yourself with the rules for naming conventions and database schema:

  • Collections names use this format: autoai_rag_a0b1c2d3_ymdHMS> where y-year, m-month, d-day, H-hours, M-minute, S-second.
  • A Milvus database uses this schema:
Field Type
document_id VarChar
start_index Int64
sequence_number Int64
text VarChar
pk Int64
vector FloatVector

Follow the steps in Coding an AutoAI RAG experiment for Chroma or Coding an AutoAI RAG experiment for Milvus to code an AutoAI RAG experiment for your use case.

AutoAI RAG optimization process

Running experiments by using AutoAI RAG avoids testing all RAG configuration options (for example, it avoids a grid search) by using a hyper-parameter optimization algorithm called Tree Parzen Estimator (TPE). The following diagram shows a subset of the RAG configuration search space with 16 RAG patterns to choose from. If the experiment evaluates them all, they are ranked 1 to 16, with the highest-ranking 3 configurations tagged as best performing. The TPE algorithm determines which subset of the RAG patterns to evaluate and stops processing the others, which are shown in gray. This process avoids exploring an exponential search space while still selecting better-performing RAG patterns in practice.

Automating the optimization process for RAG patterns

Supported features

Review these details for features provided with the beta release of the AutoAI RAG process.

Feature Description
Supported interface API
File formats for grounding document collection PDF, HTML, DOCX, plain text
Data connections for document collection IBM Cloud Object Storage (bucket)
folder in the bucket
files (up to 20)
Test data format JSON with fixed schema (Fields: question, answer, document ID)
Data connections for test data IBM Cloud Object Storage(single JSON file)
Chunking Multiple presets of 64-1024 characters
See Supported embedding models available with watsonx.ai for the default max number of tokens for each model
Embedding model ibm/slate-125m-english-rtrvr
intfloat/multilingual-e5-large
Vector store Milvus and ChromaDB
Chunk augmentation Enabled (add surrounding chunks from document)
Search-Type Standard (in a single index)
Generative models See Foundation models by task
Sampling Benchmark-driven (first select the questions, then the documents, fill with random ones till the limit)
Search Algorithm Tree Parzen Estimator (TPE) from the hyperopt library is used for hyper-parameter optimization
Metrics Answer correctness, Faithfulness, Context correctness. For more information, see
See Unitxt lexical rag metrics
Optimization Metric The metric that is used as the optimization target. Answer correctness and Faithfulness are supported.
Customizable user Constraints Chunking method
Embedding model
Generative model
Configuration count limit (max output patterns number 4 to 16)
Deployment Milvus: AutoAI notebooks for indexing and inference by using Milvus external vector database
Chroma: single AutoAI notebook for indexing and inferencing by using the Chroma in-memory vector database

Next steps

Learn more

Parent topic: Coding generative AI solutions

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more