Review the guidelines and code samples to learn how to code an AutoAI RAG experiment by using watsonx text extraction to process input documents.
You can use text extraction to process input documents for an AutoAI RAG experiment. Text extraction transforms high-quality business documents with tables, images, and diagrams into markdown format. The resulting markdown files can then be used in an AutoAI RAG experiment to enhance the quality of generated patterns.
The text extraction service uses the watsonx.ai Python client library (version 1.1.11 or later). For more information about using text extraction from watsonx.ai Python SDK, see Text Extractions.
Follow these steps to use text extraction in your AutoAI RAG experiment.
- Prepare the prerequisites for preparing data and set up the experiment
- Process input documents with text extraction
- Configure the RAG optimizer
- Run the experiment
- Review the patterns and select the best one
Step 1: Prepare the prerequisites for preparing data and set up the experiment
Prepare the prerequisites for the experiment.
-
Install and import the required modules and dependencies. For example:
pip install 'ibm-watsonx-ai[rag]>=1.1.11' pip install langchain-community==0.2.4
- Add task credentials. See Adding task credentials.
- Add the watsonx.ai Runtime service. See Creating services.
- Enter your API key. See Managing the user API key.
-
Use your credentials to initialize the client. For example:
from ibm_watsonx_ai import APIClient, Credentials credentials = Credentials( url = "https://us-south.ml.cloud.mydomain.com", api_key = "***********" ) client = APIClient(credentials)
-
Create a project or space for your work. See Creating a project or Creating a space.
-
Get the ID for the project or space. See Finding the project ID.
-
Set a default project or space:
client.set.default_project("<Project ID>")
client.set.default_space("<Space GUID>")
- Prepare the grounding documents
- Prepare the evaluation data
Grounding documents
Prepare and connect to the grounding documents that you will use to run the AutoAI RAG experiment with the text extraction service.
-
Create a connection to Cloud Object Storage and fetch the ID.
conn_meta_props= { client.connections.ConfigurationMetaNames.NAME: f"Connection to input data - {datasource_name} ", client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name), client.connections.ConfigurationMetaNames.DESCRIPTION: "ibm-watsonx-ai SDK documentation", client.connections.ConfigurationMetaNames.PROPERTIES: { 'bucket': <BUCKET_NAME>, 'access_key': <ACCESS_KEY>, 'secret_key': <SECRET_ACCESS_KEY>, 'iam_url': 'https://iam.cloud.ibm.com/identity/token', 'url': <ENDPOINT_URL> } } conn_details = client.connections.create(meta_props=conn_meta_props) cos_connection_id = client.connections.get_id(conn_details)
-
Prepare two connection assets, one for input and one for the text extraction service output.
from ibm_watsonx_ai.helpers import DataConnection, S3Location input_data_reference = DataConnection( connection_asset_id=cos_connection_id, location=S3Location( bucket=<BUCKET_NAME>, path=<TEXT EXTRACTION INPUT FILENAME> ), ) input_data_reference.set_client(client) result_data_reference = DataConnection( connection_asset_id=cos_connection_id, location=S3Location( bucket=<BUCKET_NAME>, path=<TEXT EXTRACTION OUTPUT FILENAME> ) ) result_data_reference.set_client(client)
Evaluation data
For evalutation data input:
- Data must be in the JSON format with a fixed schema with these fields:
question
,correct_answer
,correct_answer_document_ids
correct_answer_document_ids
must refer to the text extraction service output file
benchmarking_data = [
{
"question": "What are the two main variants of Granite Code models?",
"correct_answer": "The two main variants are Granite Code Base and Granite Code Instruct.",
"correct_answer_document_ids": <TEXT EXTRACTION OUTPUT FILENAME>
},
{
"question": "What is the purpose of Granite Code Instruct models?",
"correct_answer": "Granite Code Instruct models are finetuned for instruction-following tasks using datasets like CommitPack, OASST, HelpSteer, and synthetic code instruction datasets, aiming to improve reasoning and instruction-following capabilities.",
"correct_answer_document_ids": <TEXT EXTRACTION OUTPUT FILENAME>
},
{
"question": "What is the licensing model for Granite Code models?",
"correct_answer": "Granite Code models are released under the Apache 2.0 license, ensuring permissive and enterprise-friendly usage.",
"correct_answer_document_ids": <TEXT EXTRACTION OUTPUT FILENAME>
},
]
To prepare the evaluation data:
import os, wget
from ibm_watsonx_ai.helpers import DataConnection
test_data_filename = "benchmarking_data_core_api.json"
test_data_path = f"https://github.com/IBM/watsonx-ai-samples/tree/master/cloud/data/autoai_rag{test_data_filename}"
if not os.path.isfile(test_data_filename):
wget.download(test_data_path, out=test_data_filename)
test_asset_details = client.data_assets.create(name=test_data_filename, file_path=test_data_filename)
test_asset_id = client.data_assets.get_id(test_asset_details)
test_data_references = [DataConnection(data_asset_id=test_asset_id)]
Step 2: Process input documents with Text Extraction
-
Initialize the text extraction service.
from ibm_watsonx_ai.foundation_models.extractions import TextExtractions extraction = TextExtractions( credentials=credentials, space_id=<Space GUID>, )
-
Run the text extraction job.
from ibm_watsonx_ai.metanames import TextExtractionsMetaNames response = extraction.run_job( document_reference=input_data_reference, results_reference=result_data_reference, steps={ TextExtractionsMetaNames.OCR: { "process_image": True, "languages_list": ["en"], }, TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True}, }, results_format="markdown", ) job_id = response['metadata']['id']
-
Get job details.
extraction.get_job_details(job_id)
-
When the status is
completed
, move to the next step.
Step 3: Configure the RAG optimizer
The rag_optimizer
object provides a set of methods for working with the AutoAI RAG experiment. In this step, enter the details to define the experiment. These are the available configuration options:
Parameter | Description | Values |
---|---|---|
name | Enter a valid name | Experiment name |
description | Experiment description | Optionally describe the experiment |
embedding_models | Embedding models to try | ibm/slate-125m-english-rtrvr intfloat/multilingual-e5-large |
retrieval_methods | Retrieval methods to use | simple retrieves and ranks all relevant documentswindow retrieves and ranks a fixed number of relevant documents |
foundation_models | Foundation models to try | See Foundation models by task |
max_number_of_rag_patterns | Maximum number of RAG patterns to create | 4-20 |
optimization_metrics | Metric name(s) to use for optimization | faithfulness answer_correctness |
This sample code shows the configuration options for running the experiment with the ibm-watsonx-ai SDK documentation:
from ibm_watsonx_ai.experiment import AutoAI
experiment = AutoAI(credentials, project_id=project_id)
rag_optimizer = experiment.rag_optimizer(
name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
max_number_of_rag_patterns=5,
optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
rag_optimizer = experiment.rag_optimizer(
name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
embedding_models=["ibm/slate-125m-english-rtrvr"],
foundation_models=["ibm/granite-13b-chat-v2","mistralai/mixtral-8x7b-instruct-v01"],
max_number_of_rag_patterns=5,
optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
Step 4: Run the experiment
Run the optimizer to create the RAG patterns by using the specified configuration options. Use the output from text extraction as input in your AutoAI RAG experiment.
In this code sample for running a Chroma experiment, the task is run in interactive mode. You can run the task in the background by changing the background_mode
to True.
input_data_references = [result_data_reference]
rag_optimizer.run(
input_data_references=input_data_references,
test_data_references=test_data_references,
background_mode=False
)
Step 5: Review the patterns and select the best one
After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary
method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns,
ranked according to performance against the optimized metric.
summary = rag_optimizer.summary()
summary
For example, pattern results display like this:
Pattern | mean_answer_correctness | mean_faithfulness | mean_context_correctness | chunking.chunk_size | embeddings.model_id | vector_store.distance_metric | retrieval.method | retrieval.number_of_chunks | generation.model_id |
---|---|---|---|---|---|---|---|---|---|
Pattern1 | 0.6802 | 0.5407 | 1.0000 | 512 | ibm/slate-125m-english-rtrvr | euclidean | window | 5 | meta-llama/llama-3-70b-instruct |
Pattern2 | 0.7172 | 0.5950 | 1.0000 | 1024 | intfloat/multilingual-e5-large | euclidean | window | 5 | ibm/granite-13b-chat-v2 |
Pattern3 | 0.6543 | 0.5144 | 1.0000 | 1024 | intfloat/multilingual-e5-large | euclidean | simple | 5 | ibm/granite-13b-chat-v2 |
Pattern4 | 0.6216 | 0.5030 | 1.0000 | 1024 | intfloat/multilingual-e5-large | cosine | window | 5 | meta-llama/llama-3-70b-instruct |
Pattern5 | 0.7369 | 0.5630 | 1.0000 | 1024 | intfloat/multilingual-e5-large | cosine | window | 3 | mistralai/mixtral-8x7b-instruct-v01 |
Select a pattern to test locally
The next step is to select a pattern and test it locally. Because Chroma is in-memory, you must re-create the document index.
In the following code sample, the index is built with the documents core_api.html
and fm_embeddings.html
.
from langchain_community.document_loaders import WebBaseLoader
best_pattern = rag_optimizer.get_pattern()
urls = [
"https://ibm.github.io/watsonx-ai-python-sdk/core_api.html",
"https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)
best_pattern.indexing_function(doc_splits)
Query the RAG pattern locally.
payload = {
client.deployments.ScoringMetaNames.INPUT_DATA: [
{
"values": ["How to use new approach of providing credentials to APIClient?"],
}
]
}
best_pattern.query(payload)
The model's response looks like this:
According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
credentials = Credentials(
url = "https://us-south.ml.cloud.ibm.com",
token = "***********",
)
client = APIClient(credentials)
This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.
To retrieve a specific pattern, pass the pattern number to rag_optimizer.get_pattern()
.
Reviewing experiment results in Cloud Object Storage
If the final status of the experiment is failed or error, use rag_optimizer.get_logs()
or refer to experiment results to understand what went wrong. Experiment results and logs are stored in the default Cloud Object Storage instance
that is linked to your account. By default, results are saved in the default_autoai_rag_out
directory.
Results are organized by pattern. For example:
|-- Pattern1
| | -- evaluation_results.json
| | -- indexing_inference_notebook.ipynb (Chroma)
|-- Pattern2
| ...
|-- training_status.json
Each pattern contains these results:
- The
evaluation_results.json
file contains evaluation results for each benchmark question. - The
indexing_inference_notebook.ipynb
contains the Python code for building a vector database index as well as building retrieval and generation function. The notebook introduces commands for retrieving data, chunking, and embeddings creation as well as for retrieving chunks, building prompts, and generating answers.
The results notebook indexing_notebook.ipynb
contains the code for embedding and indexing the documents. You can accelerate the document indexing task by changing vector_store.add_documents()
to vector_store.add_documents_async()
.
Get an inference and indexing notebook
To download a specified inference notebook, use the get_inference_notebook()
. If you leave pattern_name
empty, the method downloads the notebook of the best computed pattern.
rag_optimizer.get_inference_notebook(pattern_name='Pattern3')
For more information and code samples, refer to the AutoAI RAG with watsonx Text Extraction service notebook.
Parent topic: Automating a RAG pattern with the AutoAI SDK