When you save a retrieval-augmented generation (RAG) experiment pipeline, notebooks are automatically generated and saved to the project. In the case of a Milvus experiment, you can also save and deploy a RAG pattern as an AI service. Learn about the options for saving a RAG pattern and using the patterns in your applications.
Saving a RAG pattern as auto-generated notebooks
After you run an experiment, you can review the generated patterns that are ranked in the leaderboard according to performance against the optimized metric. When you are satisfied with a pattern, you can save it, generating one or two notebooks saved as project assets. If you create a Milvus experiment, you also have the option of deploying immediately as an AI asset.
The notebooks that are generated for a saved RAG pattern depend on the vector store used for the experiment, as follows:
- The index notebook populates, updates, and maintains the vector index for the document collection. All AutoAI RAG patterns can generate an indexing notebook.
- The inference notebook provides an endpoint for inferencing against a large language model with the augmented retrieval capabilities. Only experiments that use a Milvus database as a vector store generate an inferencing notebook.
For a Milvus experiment, an AI service packages a pipeline for immediate deployment to a deployment space, where you can inference against the endpoint.
Generating the indexing and inferencing notebooks
After you review your pipelines, follow these steps to save a pipeline and generate the associated notebooks.
- From the experiment leaderboard, click the name of a pipeline to view the details.
- Click Save. The panel lists the notebook or notebooks that are auto-generated. For example, the following image shows the save panel for a pattern created by using the in-memory Chroma database as a vector store.
- Click Create.
- Open the notebooks from the associated project to review or run the code. For example, the indexing notebook looks as follows:
You can review the notebooks or run them by adding authentication credentials.
Reviewing the index notebook
The index notebook contains Python code for building the vector database index for your document collection.
The notebook is annotated so that you can review the steps and code for:
- Retrieving the data to vectorize
- Chunking the data
- Creating the embeddings
- Reading the benchmark data
- Using the benchmark data to evaluate the quality of the retrieval
Reviewing the inference notebook
The inference notebook contains Python code to:
- Retrieve relevant passages from the indexed documents for each user query
- Generate a response to each user query by feeding the retrieved passages into a large language model for use in the generated response
The notebook is annotated so that you can review the steps and code for:
- Building the inference Python function by using the RAG pattern that was identified in the experiment
- Deploying the function as the inference endpoint
- Testing the retrieval of relevant passages as input for the generated response
Run the inferencing notebook to use the RAG pattern for retrieving and generating answers to questions.
Saving a RAG pattern as a deployable AI service
If you used a Milvus vector store to run your experiment, you also have the option to save your RAG pattern as a deployable AI service. An AI service:
- Is a deployable Python function that captures the logic for the RAG pattern.
- Creates a project asset.
- Optionally promotes a copy of the asset to a deployment space and creates the deployment so you can access the endpoint and inference the pattern.
To create and deploy the AI service:
- Choose Save as for the RAG pattern.
- Choose Retrieval and generation as the objective.
- Choose AI service as the asset type.
- Select Promote and deply AI service to deployment space.
- Click Create and deploy.
- Choose an existing deployment space or create a new one.
When the deployment process completes, click the deployment name to open the AI service for testing. From the deployment, you can:
- Get the endpoint and code snippets from the API reference tab to use the RAG pattern in an application.
- Switch to the Test tab to enter or upload new questions in JSON format to use with the RAG pattern. Use the same JSON format you used for the evaluation questions, but do not supply the answers.
Learn more
Use the indexed documents from this experiment in the Prompt Lab to ground prompts for a foundation model. See Using an AutoAI Rag index to chat with documents.
Parent topic: Creating a RAG experiment