Saving a RAG pattern

Last updated: Feb 26, 2025

Save a retrieval-augmented generation (RAG) experiment pipeline automatically generated notebooks that you can use for indexing, retrieval, and generation with the RAG pattern.

Saving a RAG pattern as auto-generated notebooks

After you run an experiment, you can review the generated patterns that are ranked in the leaderboard according to performance against the optimized metric. When you are satisfied with a pattern, you can save it, generating one or two notebooks saved as project assets. If you create a Milvus experiment, you also have the option of deploying immediately as an AI asset.

The notebooks that are generated for a saved RAG pattern depend on the vector store used for the experiment, as follows:

The index notebook populates, updates, and maintains the vector index for the document collection. All AutoAI RAG patterns can generate an indexing notebook.
The inference notebook provides an endpoint for inferencing against a large language model with the augmented retrieval capabilities. Only experiments that use a Milvus database as a vector store generate an inferencing notebook.

For a Milvus experiment, an AI service packages a pipeline for immediate deployment to a deployment space, where you can inference against the endpoint.

Generating the indexing and inferencing notebooks

After you review your pipelines, follow these steps to save a pipeline and generate the associated notebooks.

From the experiment leaderboard, click the name of a pipeline to view the details.
Click Save. The panel lists the notebook or notebooks that are auto-generated. For example, the following image shows the save panel for a pattern created by using the in-memory Chroma database as a vector store.
Click Create.
Open the notebooks from the associated project to review or run the code. For example, the indexing notebook looks as follows:

You can review the notebooks or run them by adding authentication credentials.

Reviewing the index notebook

The index notebook contains Python code for building the vector database index for your document collection.

The notebook is annotated so that you can review the steps and code for:

Retrieving the data to vectorize
Chunking the data
Creating the embeddings
Reading the benchmark data
Using the benchmark data to evaluate the quality of the retrieval

Reviewing the inference notebook

The inference notebook contains Python code to:

Retrieve relevant passages from the indexed documents for each user query
Generate a response to each user query by feeding the retrieved passages into a large language model for use in the generated response

The notebook is annotated so that you can review the steps and code for:

Building the inference Python function by using the RAG pattern that was identified in the experiment
Deploying the function as the inference endpoint
Testing the retrieval of relevant passages as input for the generated response

Run the inferencing notebook to use the RAG pattern for retrieving and generating answers to questions.

Saving a RAG pattern as a deployable AI service

If you used a Milvus vector store to run your experiment, you also have the option to save your RAG pattern as a deployable AI service. An AI service:

Is a deployable Python function that captures the logic for the RAG pattern.
Creates a project asset.
Optionally promotes a copy of the asset to a deployment space and creates the deployment so you can access the endpoint and inference the pattern.

To create and deploy the AI service:

Choose Save as for the RAG pattern.
Choose Retrieval and generation as the objective.
Choose AI service as the asset type.
Select Promote and depl0y AI service to deployment space.
Click Create and deploy.
Choose an existing deployment space or create a new one.

When the deployment process completes, click the deployment name to open the AI service for testing. From the deployment, you can:

Get the endpoint and code snippets from the API reference tab to use the RAG pattern in an application.
Switch to the Test tab to enter or upload new questions in JSON format to use with the RAG pattern. Use the same JSON format you used for the evaluation questions, but do not supply the answers.

Learn more

Use the indexed documents from this experiment in the Prompt Lab to ground prompts for a foundation model. See Using an AutoAI Rag index to chat with documents.

Parent topic: Creating a RAG experiment

Was the topic helpful?

0/1000