0 / 0
Model risk evaluation engine
Last updated: Mar 05, 2025
Model risk evaluation engine

The model risk evaluation engine measures risks for foundation models by computing metrics that are related to risk dimensions to help you identify the best models that achieve your risk tolerance goals.

The model risk evaluation engine is a module in the ibm-watsonx-gov Python SDK that can help you understand generative AI risks and establish effective methods for measuring and mitigating the risks. This module helps provide quantitative risk assessments of foundation models and supports evaluations of watsonx.ai large language models and external models from other providers.

Evaluations calculate metrics for the following risk dimensions:

  • Toxic output
  • Harmful output
  • Prompt leaking
  • Hallucination
  • Prompt injection
  • Jailbreaking
  • Output bias
  • Harmful code generation

The risk dimensions are a collection of risks that can occur when working with generative AI assets and machine learning models. . For more information, see the AI Risk Atlas. The available risk dimensions are a subset of the risks available in the Risk Atlas.

You must provide watsonx.ai credentials to calculate metrics for the jailbreaking, prompt-leaking, harmful-code generation, and prompt-injection risks. For each risk dimension, one or more standardized data sets are used to evaluate the risk level. These datasets are stored in Unitxt cards.

After risk assessments are complete, you can save the results in the Governance console or export the results as a PDF report that summarizes the metrics that are calculated.

You can use the model risk evaluation engine to complete the following tasks:

  • Compute metrics with watsonx.ai as the inference engine.
  • Compute risk metrics for foundation models in watsonx.ai.
  • Compute metrics for foundation models that are not in watsonx.ai by implementing your own scoring function for any model and evaluating it.
  • Store computed metrics in the Governance console (OpenPages).
  • Retrieve computed metrics from the Governance console (OpenPages).
  • Generate a PDF report of computed metrics.
  • Display the metrics in a notebook cell in a table or chart format.

Input

You can specify the following input parameters when you use the model risk evaluation engine:

Table 1. Input parameters for model risk evaluation engine
Parameter Description
wx_gc_configuration The Governance console configuration to store the computed metrics result. Storing the evaluation result in the Governance console prevents recomputing the metrics during the next evaluation. The evaluation engine retrieves the saved metrics instead.
foundation_model_name The name of the foundation model under evaluation.
risk_dimensions A list of risks to be evaluated. If not provided, all available risks will be evaluated.
max_sample_size The maximum number of data instances to be used for evaluation. Specify a smaller value (for example, 50) to speed up the evaluation, or set it to None to use all data for evaluation, which takes longer but ensures meaningful results.
model_details The foundation model details. The value can be WxAIFoundationModel or CustomFoundationModel where WxAIFoundationModel is an object that represent the logic to invoke inferencing for watsonx.ai, and CustomFoundationModel is an object that contains the logic to invoke the logic for an external LLM.
pdf_report_output_path The file path specified by the user where the generated PDF report will be saved.

Output

The model risk evaluation engine computes metrics for each risk dimension as output. This output can be saved in a notebook cell, stored in OpenPages, or exported as a PDF report. The model risk evaluation engine computes metrics for the following risk dimensions:

Table 2. Risk dimensions for Model risk evaluation engine
Risk Description
Toxic output The model produces hateful, abusive, and profane (HAP) or obscene content.
Harmful output The model might generate language that leads to physical harm or language that includes overtly violent, covertly dangerous, or otherwise indirectly unsafe statements.
Hallucination Factually inaccurate or untruthful content regarding the model training data or input. This risk is also sometimes referred to as lack of faithfulness or lack of groundedness.
Prompt injection An attack that forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt.
Jailbreaking An attack that attempts to break through the guardrails that are established in the model to perform restricted actions.
Output bias Generated content might unfairly represent certain groups or individuals.
Prompt leaking An attempt to extract a model's system prompt
Harmful code-generation Models might generate code that causes harm or unintentionally affects other systems.

Examples

You can run evaluations and generate results with the model risk evaluation engine as shown in the following examples:

Step 1: configuration

Create a model risk evaluation engine configuration:

from ibm_watsonx_gov.config.model_risk_configuration import ModelRiskConfiguration, WxGovConsoleConfiguration

configuration = ModelRiskConfiguration(
    model_details = model_details,
    risk_dimensions=risk_dimensions,
    max_sample_size=max_sample_size,
    pdf_report_output_path=pdf_report_output_path,
    # wx_gc_configuration=wx_gc_configuration, # uncomment this line if the result should be pushed to Governance Console (OpenPages)
)

Step 2: run evaluation

Run an evaluation to measure risks:

from ibm_watsonx_gov.evaluate import evaluate_model_risk

evaluation_results = evaluate_model_risk(
    configuration=configuration,
    credentials=credentials,
)

print(evaluation_results.risks)

Step 3: generate PDF report

Export the evaluated data and metrics as a PDF report:

from ibm_wos_utils.joblib.utils.notebook_utils import  create_download_link_for_file
pdf_file = create_download_link_for_file(evaluation_results.output_file_path)
display((pdf_file))

For more information, see the Model Risk Evaluation Engine notebook.

Parent topic: Metrics computation using Python SDK