Coding an AutoAI RAG experiment with a Milvus vector store
Last updated: Feb 21, 2025
Coding an AutoAI RAG experiment with a Milvus vector store
Review the guidelines and code samples to learn how to code an AutoAI RAG experiment with a Milvus database as a vector store.
For an enterprise or production RAG solution, set up a vector database with Milvus. The vectorized content persists for future patterns and integrations. For details, see Working with Milvus.
input_data_references supports up to 20 DataConnection instances.
Evaluation data
Copy link to section
Evaluation data must be in JSON format with a fixed schema with these fields: question, correct_answer, correct_answer_document_ids
For example:
[
{
"question": "What is the purpose of get_token()?",
"correct_answer": "get_token() is used to retrieve an authentication token for secure API access.",
"correct_answer_document_ids": [
"core_api.html"
]
},
{
"question": "How does the delete_model() function operate?",
"correct_answer": "delete_model() method allows users to delete models they've created or managed.",
"correct_answer_document_ids": [
"core_api.html"
]
}
]
The rag_optimizer object provides a set of methods for working with the AutoAI RAG experiment. In this step, enter the details to define the experiment. The available configuration options are as follows:
Run the optimizer to create the RAG patterns by using the specified configuration options. In this code sample, the task is run in interactive mode. You can run the task in the background by changing the background_mode to True.
Step 4: Review the patterns and select the best one
Copy link to section
After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns,
ranked according to performance against the optimized metric.
summary = rag_optimizer.summary()
summary
Copy to clipboardCopied to clipboard
For example, pattern results display like this:
Pattern
mean_answer_correctness
mean_faithfulness
mean_context_correctness
chunking.chunk_size
embeddings.model_id
vector_store.distance_metric
retrieval.method
retrieval.number_of_chunks
generation.model_id
Pattern1
0.6802
0.5407
1.0000
512
ibm/slate-125m-english-rtrvr
euclidean
window
5
meta-llama/llama-3-70b-instruct
Pattern2
0.7172
0.5950
1.0000
1024
intfloat/multilingual-e5-large
euclidean
window
5
ibm/granite-13b-chat-v2
Pattern3
0.6543
0.5144
1.0000
1024
intfloat/multilingual-e5-large
euclidean
simple
5
ibm/granite-13b-chat-v2
Pattern4
0.6216
0.5030
1.0000
1024
intfloat/multilingual-e5-large
cosine
window
5
meta-llama/llama-3-70b-instruct
Pattern5
0.7369
0.5630
1.0000
1024
intfloat/multilingual-e5-large
cosine
window
3
mistralai/mixtral-8x7b-instruct-v01
Select a pattern to test locally
Copy link to section
The next step is to select a pattern and test it locally.
best_pattern = rag_optimizer.get_pattern()
Copy to clipboardCopied to clipboard
payload = {
client.deployments.ScoringMetaNames.INPUT_DATA: [
{
"values": ["How to use new approach of providing credentials to APIClient?"],
}
]
}
resp = best_pattern.query(payload)
print(resp["predictions"][0]["values"][0][0])
Copy to clipboardCopied to clipboard
Model's response:
According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
credentials = Credentials(
url = "https://us-south.ml.cloud.ibm.com",
token = "***********",
)
client = APIClient(credentials)
This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.
Tip:
To retrieve a specific pattern, pass the pattern name to rag_optimizer.get_pattern().
Step 5: Deploy a pattern
Copy link to section
After you test a pattern locally, you can deploy the pattern to get the endpoint and include it in apps. Deployment is done by storing the defined RAG function, then creating a deployed asset. For more information on deployments, see Deploying and managing AI assets and Online deployments.
The RAG service is now deployed in a space and available to test.
Testing the deployed pattern
Copy link to section
This code sample demonstrates how to test the deployed solution. Enter test questions in the payload, by using the following format:
questions = ["How to use new approach of providing credentials to APIClient?"]
payload = {
client.deployments.ScoringMetaNames.INPUT_DATA: [
{
"values": questions,
"access_token": client.service_instance._get_token()
}
]
}
resp = client.deployments.score(deployment_id, payload)
print(resp["predictions"][0]["values"][0][0])
Copy to clipboardCopied to clipboard
Model's response:
According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
credentials = Credentials(
url = "https://us-south.ml.cloud.ibm.com",
token = "***********",
)
client = APIClient(credentials)
This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.
Reviewing experiment results in Cloud Object Storage.
Copy link to section
If the final status of the experiment is failed or error, use rag_optimizer.get_logs() or refer to experiment results to understand what went wrong. Experiment results and logs are stored in the default Cloud Object Storage instance
that is linked to your account. By default, results are saved by experiment training ID in the default_autoai_rag_out directory.
The evaluation_results.json file contains evaluation results for each benchmark question.
The indexing_notebook.ipynb contains the python code for building a vector database index. It introduces commands for retrieving data, chunking, and embeddings creation.
The inference_notebook.ipynb notebook focuses on retrieving relevant passages from a knowledge base for user queries and generating responses by feeding the retrieved passages into a large language model.
You can review the notebooks or run them by adding authentication credentials.
Note:
The results notebook indexing_notebook.ipynb contains the code for embedding and indexing the documents. You can accelerate the document indexing task by changing vector_store.add_documents() to vector_store.add_documents_async().
Get inference and indexing notebooks
Copy link to section
To download the specified inference notebook from Service, use get_inference_notebook(). If you leave pattern_name empty, the method downloads the notebook for the highest-ranked pattern.