0 / 0

Coding an AutoAI RAG experiment with a custom foundation model

Last updated: May 06, 2025
Coding an AutoAI RAG experiment with a custom foundation model

Review the guidelines and code samples to learn how to code an AutoAI RAG experiment and use custom foundation models.

The custom model deployement uses the watsonx.ai Python client library (version 1.3.12 or later).

Follow these steps to use a custom foundation model in your AutoAI RAG experiment.

  1. Prepare the prerequisites for custom foundation model deployment
  2. Deploy the model
  3. Prepare grounding data
  4. Prepare evaluation data
  5. Run the experiment
  6. Review the patterns and select the best one

Step 1: Prepare the prerequisites for custom foundation model deployment

  1. Download the model snapshot.

    from pathlib import Path
    from huggingface_hub import snapshot_download
    
    byom_cache_dir = Path("your", "model", "cache", "dir")
    
    if not byom_cache_dir.exists():
        raise FileExistsError("Please use the path which exists.")
    
    if byom_cache_dir.is_file():
        raise NotADirectoryError("Please use the path which points to a directory.")
    
    snapshot_download(HUGGING_FACE_MODEL_REPOSITORY, cache_dir=byom_cache_dir)
    
  2. Create a connection to Cloud Object Storage.

    from ibm_watsonx_ai import APIClient, Credentials
    
    credentials = Credentials(
                    api_key=<API_KEY>,
                    url=<WML_ENDPOINT>
                )
    
    client = APIClient(credentials=credentials,  project_id=<PROJECT_ID>)
    
  3. Connect to your S3Bucket.

    from ibm_watsonx_ai.helpers.connections import DataConnection, S3Location
    
    location = S3Location(bucket=BUCKET_NAME, path=BUCKET_MODEL_DIR_NAME)
    data_connection = DataConnection(location=location, connection_asset_id=DATASOURCE_CONNECTION_ASSET_ID)
    data_connection.set_client(api_client=client)
    
  4. Upload model files to your S3Bucket.

    model_files = byom_cache_dir / "model_dir_name" / "snapshots" / "snapshot_id"
    
    if not model_files.exists():
        raise FileExistsError("Please use the snapshot path which exists.")
    
    if model_files.is_file():
        raise NotADirectoryError("Please use the snapshot path which points to a directory.")
    
    for model_file in model_files.iterdir():
        
        # avoid uploading unnecessary files
        if model_file.name.startswith("."):
            continue
    
        data_connection.write(data=str(model_file), remote_name=model_file.name)
    

Step 2: Deploy the model

To deploy your custom foundation model, follow the steps in the Custom models documentation.

Step 3: Prepare grounding data

Prepare and connect to the grounding documents you will use to run the RAG experiment. For details, see Getting and preparing data in a project.

  • Supported formats: PDF, HTML, DOCX, Markdown, plain text
  • Connect to data in a Cloud Object Storage bucket, a folder in a bucket, or specify up to 20 files.
  • AutoAI uses sample of documents for running the experiment

For example, to create a data connection when documents are stored in a Cloud Object Storage bucket:

from ibm_watsonx_ai.helpers import DataConnection, S3Location

conn_meta_props= {
    client.connections.ConfigurationMetaNames.NAME: f"Connection to input data - {datasource_name} ",
    client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name),
    client.connections.ConfigurationMetaNames.DESCRIPTION: "ibm-watsonx-ai SDK documentation",
    client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': <BUCKET_NAME>,
        'access_key': <ACCESS_KEY>,
        'secret_key': <SECRET_ACCESS_KEY>,
        'iam_url': 'https://iam.cloud.ibm.com/identity/token',
        'url': <ENDPOINT_URL>
    }
}

conn_details = client.connections.create(meta_props=conn_meta_props)
cos_connection_id = client.connections.get_id(conn_details)

input_data_references = [DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(
        bucket=<BACKET_NAME>,
        path=<BACKET_PATH>
    )
)]

The following example shows how to use the data asset created in the project (or promoted to the space).

Note:

core_api.html is an example of a grounding document file used in the sample notebooks.

import os, wget

input_data_filename = "core_api.html"
input_data_path = f"https://ibm.github.io/watsonx-ai-python-sdk/{input_data_filename}"

if not os.path.isfile(input_data_filename): 
    wget.download(input_data_path, out=input_data_filename)
    
asset_details = client.data_assets.create(input_data_filename, input_data_filename)
asset_id = client.data_assets.get_id(asset_details)

input_data_references = [DataConnection(data_asset_id=asset_id)]
Tip:

input_data_references supports up to 20 DataConnection instances.

Step 4: Prepare evaluation data

  1. Download the granite_code_models.pdf document.

    import wget
    
    data_url = "https://arxiv.org/pdf/2405.04324"
    byom_input_filename = "granite_code_models.pdf"
    wget.download(data_url, byom_input_filename)
    
  2. Prepare the evaluation data.

    For correct_answer_document_ids, provide the downloaded file name.

    import json 
    
    local_benchmark_json_filename = "benchmark.json"
    
    benchmarking_data = [
        {
            "question": "What are the two main variants of Granite Code models?",
            "correct_answer": "The two main variants are Granite Code Base and Granite Code Instruct.",
            "correct_answer_document_ids": [byom_input_filename]
        },
        {
            "question": "What is the purpose of Granite Code Instruct models?",
            "correct_answer": "Granite Code Instruct models are finetuned for instruction-following tasks using datasets like CommitPack, OASST, HelpSteer, and synthetic code instruction datasets, aiming to improve reasoning and instruction-following capabilities.",
            "correct_answer_document_ids": [byom_input_filename]
        },
        {
            "question": "What is the licensing model for Granite Code models?",
            "correct_answer": "Granite Code models are released under the Apache 2.0 license, ensuring permissive and enterprise-friendly usage.",
            "correct_answer_document_ids": [byom_input_filename]
        },
    ]
    
    with open(local_benchmark_json_filename, mode="w", encoding="utf-8") as fp:
        json.dump(benchmarking_data, fp, indent=4)
    
  3. Upload the evaluation files to your Cloud Object Storage bucket.

    documents_dir_location = S3Location(bucket=BUCKET_NAME, path=byom_input_filename)
    documents_dir_data_connection = DataConnection(location=documents_dir_location, connection_asset_id=DATASOURCE_CONNECTION_ASSET_ID)
    documents_dir_data_connection.set_client(api_client=client)
    documents_dir_data_connection.write(data=byom_input_filename, remote_name=byom_input_filename)
    
    benchmark_file_location = S3Location(bucket=BUCKET_NAME, path=BUCKET_BENCHMARK_JSON_FILE_PATH)
    benchmark_file_data_connection = DataConnection(location=benchmark_file_location, connection_asset_id=DATASOURCE_CONNECTION_ASSET_ID)
    benchmark_file_data_connection.set_client(api_client=client)
    benchmark_file_data_connection.write(data=local_benchmark_json_filename)
    

Step 5: Run the AutoAI RAG experiment with the custom foundation model

Run the experiment with Python SDK. For deployment_id, provide the the ID of your deployed custom foundation model.

from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.experiment import AutoAI
from ibm_watsonx_ai.helpers.connections import ContainerLocation
from ibm_watsonx_ai.foundation_models.schema import (
        AutoAIRAGCustomModelConfig,
        AutoAIRAGModelParams
)

credentials = Credentials(
                api_key=<API_KEY>,
                url=<WML_ENDPOINT>
)

experiment = AutoAI(credentials, project_id=<PROJECT_ID>)

deployment_id = <DEPLOYMENT_ID> # custom foundation model deployment id 
deployment_project_id = <DEPLOYMENT_PROJECT_ID> # project ID where your custom foundation model has been deployed
custom_prompt_template_text = "Answer my question {question} related to these documents {reference_documents}."
custom_context_template_text = "My document {document}"

parameters = AutoAIRAGModelParams(max_sequence_length=32_000)
custom_foundation_model_config = AutoAIRAGCustomModelConfig(
    deployment_id=deployment_id,
    project_id=deployment_project_id, 
    prompt_template_text=custom_prompt_template_text, 
    context_template_text=custom_context_template_text, 
    parameters=parameters
)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG - Custom foundation model experiment',
    description = "AutoAI RAG experiment with custom foundation model.",
    max_number_of_rag_patterns=4,
    optimization_metrics=['faithfulness'],
    foundation_models=[custom_foundation_model_config],
) 

container_data_location = DataConnection(
        type="container",
        location=ContainerLocation(
           path="autorag/results"
        ),
)

container_data_location.set_client(api_client=client)

rag_optimizer.run(
    test_data_references=[benchmark_file_data_connection],
    input_data_references=[documents_dir_data_connection],
    results_reference=container_data_location,
)

To get job details use:

rag_optimizer.get_details()

Once the status is complete, you can move to the next step.

Step 6: Review the patterns and select the best one

After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns, ranked according to performance against the optimized metric.

summary = rag_optimizer.summary()
summary

For example, pattern results display like this:

Pattern mean_answer_correctness mean_faithfulness mean_context_correctness chunking.chunk_size embeddings.model_id vector_store.distance_metric retrieval.method retrieval.number_of_chunks generation.deployment_id
Pattern1 0.6802 0.5407 1.0000 512 ibm/slate-125m-english-rtrvr euclidean window 5 38aeef16-c69c-4858-ba69-42f97d965abc
Pattern2 0.7172 0.5950 1.0000 1024 intfloat/multilingual-e5-large euclidean window 5 38aeef16-c69c-4858-ba69-42f97d965abc
Pattern3 0.6543 0.5144 1.0000 1024 intfloat/multilingual-e5-large euclidean simple 5 38aeef16-c69c-4858-ba69-42f97d965abc
Pattern4 0.6216 0.5030 1.0000 1024 intfloat/multilingual-e5-large cosine window 5 38aeef16-c69c-4858-ba69-42f97d965abc
Pattern5 0.7369 0.5630 1.0000 1024 intfloat/multilingual-e5-large cosine window 3 38aeef16-c69c-4858-ba69-42f97d965abc

Select a pattern to test locally

  1. Recreate the document index before you can select a pattern and test it locally.

    Tip:

    In the following code sample, the index is built with the documents core_api.html and fm_embeddings.html.

    from langchain_community.document_loaders import WebBaseLoader
    
    best_pattern = rag_optimizer.get_pattern()
    
    urls = [
        "https://ibm.github.io/watsonx-ai-python-sdk/core_api.html",
        "https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
    ]
    docs_list = WebBaseLoader(urls).load()
    doc_splits = best_pattern.chunker.split_documents(docs_list)
    best_pattern.indexing_function(doc_splits)
    
  2. Query the RAG pattern locally.

    payload = {
        client.deployments.ScoringMetaNames.INPUT_DATA: [
            {
                "values": ["How to use new approach of providing credentials to APIClient?"],
            }
        ]
    }
    
    best_pattern.query(payload)
    

The model's response looks like this:

According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:


from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)


This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.
Tip:

To retrieve a specific pattern, pass the pattern number to rag_optimizer.get_pattern().

Reviewing experiment results in Cloud Object Storage

If the final status of the experiment is failed or error, use rag_optimizer.get_logs() or refer to experiment results to understand what went wrong. Experiment results and logs are stored in the default Cloud Object Storage instance linked to your account. By default, results are saved in the default_autoai_rag_out directory.

Results are organized by pattern. For example:

|-- Pattern1
|      | -- evaluation_results.json
|      | -- indexing_inference_notebook.ipynb (Chroma)
|-- Pattern2
|    ...
|-- training_status.json

Each pattern contains these results:

  • The evaluation_results.json file contains evaluation results for each benchmark question.
  • The indexing_inference_notebook.ipynb contains the python code for building vector database index as well as building retrieval and generation function. The notebook introduces commands for retrieving data, chunking, and embeddings creation as well as for retrieving chunks, building prompts and generating answers.
Note:

The results notebook indexing_notebook.ipynb contains the code for embedding and indexing the documents. You can accelerate the document indexing task by changing vector_store.add_documents() to vector_store.add_documents_async().

Get inference and indexing notebook

To download a specified inference notebook, use the get_inference_notebook(). If you leave pattern_name empty, the method downloads the notebook of the best computed pattern.

rag_optimizer.get_inference_notebook(pattern_name='Pattern3')

For more information and code samples, refer to the Using AutoAI RAG with custom foundation model notebook.

Parent topic: Automating a RAG pattern with the AutoAI SDK