Model insights

Last updated: Mar 05, 2025

Model insights

The model insights module can help you improve the performance of your RAG applications by analyzing evaluation results to help you identify the best solutions for your use case.

Model insights is a module in the ibm-watsonx-gov Python SDK. You can use the model insights module to build the model insights dashboard that provides an interactive way to visualize and analyze LLM evaluation metrics. You can use the dashboard to view and organize records that violated configured metric thresholds.

To use the model insights module, you must have a dataset that contains evaluated LLM records with precomputed metrics and a configuration file that specifies the threshold values for each metric. For RAG applications, the records contain metrics for each of the user questions that are used to test the applications.

The module supports in-depth analysis of violations to facilitate root cause investigation to help you understand the factors that contribute to metric scores.

Examples

You can configure and visualize metrics for datasets with the model insights module as shown in the following examples:

Step 1: configuration

Create a model insights module configuration with details of the dataset columns and the metric thresholds:

from ibm_watsonx_gov.config import GenAIConfiguration
from ibm_watsonx_gov.metrics import (
    AveragePrecisionMetric,
    ContextRelevanceMetric,
    FaithfulnessMetric,
    HitRateMetric,
    NDCGMetric,
    ReciprocalRankMetric,
    RetrievalPrecisionMetric,
    UnsuccessfulRequestsMetric
)
from ibm_watsonx_gov.entities.enums import TaskType
from ibm_watsonx_gov.visualizations import ModelInsights

question_field = "question"
context_fields = ["context1",  "context2", "context3", "context4"]

configuration = GenAIConfiguration(
    input_fields=[question_field]+context_fields,
    question_field=question_field,
    context_fields=context_fields,
    output_fields=["answer"],
    task_type=TaskType.RAG,
)

metrics = [
    AveragePrecisionMetric(),
    ContextRelevanceMetric(),
    FaithfulnessMetric(),
    HitRateMetric(),
    NDCGMetric(),
    ReciprocalRankMetric(),
    RetrievalPrecisionMetric(),
    UnsuccessfulRequestsMetric(),
]

model_insights = ModelInsights(configuration=configuration, metrics=metrics)

Step 2: display the violated records

Provide the dataset with metric values to the model insights module to create interactive visualizations that are based on the threshold configurations:

%matplotlib ipympl
import pandas as pd

# Load the results dataframe from the sample file
df = pd.read_csv("../data/rag/sample_metrics.csv")

# Find the violated records and display them
model_insights.display_metrics(metrics_result=df)

For more information, see the Model Insights notebook.

Parent topic: Metrics computation using Python SDK

Was the topic helpful?

0/1000