The model risk evaluation engine measures risks for foundation models by computing metrics that are related to risk dimensions to help you identify the best models that achieve your risk tolerance goals.
The model risk evaluation engine is a module in the ibm-watsonx-gov
Python SDK that can help you understand generative AI risks and establish effective methods for measuring and
mitigating the risks. This module helps provide quantitative risk assessments of foundation models and supports evaluations of watsonx.ai large language models and external models from other providers.
Evaluations calculate metrics for the following risk dimensions:
- Toxic output
- Harmful output
- Prompt leaking
- Hallucination
- Prompt injection
- Jailbreaking
- Output bias
- Harmful code generation
The risk dimensions are a collection of risks that can occur when working with generative AI assets and machine learning models. . For more information, see the AI Risk Atlas. The available risk dimensions are a subset of the risks available in the Risk Atlas.
You must provide watsonx.ai credentials to calculate metrics for the jailbreaking, prompt-leaking, harmful-code generation, and prompt-injection risks. For each risk dimension, one or more standardized data sets are used to evaluate the risk level. These datasets are stored in Unitxt cards.
After risk assessments are complete, you can save the results in the Governance console or export the results as a PDF report that summarizes the metrics that are calculated.
You can use the model risk evaluation engine to complete the following tasks:
- Compute metrics with watsonx.ai as the inference engine.
- Compute risk metrics for foundation models in watsonx.ai.
- Compute metrics for foundation models that are not in watsonx.ai by implementing your own scoring function for any model and evaluating it.
- Store computed metrics in the Governance console (OpenPages).
- Retrieve computed metrics from the Governance console (OpenPages).
- Generate a PDF report of computed metrics.
- Display the metrics in a notebook cell in a table or chart format.
Input
You can specify the following input parameters when you use the model risk evaluation engine:
Parameter | Description |
---|---|
wx_gc_configuration |
The Governance console configuration to store the computed metrics result. Storing the evaluation result in the Governance console prevents recomputing the metrics during the next evaluation. The evaluation engine retrieves the saved metrics instead. |
foundation_model_name |
The name of the foundation model under evaluation. |
risk_dimensions |
A list of risks to be evaluated. If not provided, all available risks will be evaluated. |
max_sample_size |
The maximum number of data instances to be used for evaluation. Specify a smaller value (for example, 50) to speed up the evaluation, or set it to None to use all data for evaluation, which takes longer but ensures meaningful results. |
model_details |
The foundation model details. The value can be WxAIFoundationModel or CustomFoundationModel where WxAIFoundationModel is an object that represent the logic to invoke inferencing for watsonx.ai, and
CustomFoundationModel is an object that contains the logic to invoke the logic for an external LLM. |
pdf_report_output_path |
The file path specified by the user where the generated PDF report will be saved. |
Output
The model risk evaluation engine computes metrics for each risk dimension as output. This output can be saved in a notebook cell, stored in OpenPages, or exported as a PDF report. The model risk evaluation engine computes metrics for the following risk dimensions:
Risk | Description |
---|---|
Toxic output | The model produces hateful, abusive, and profane (HAP) or obscene content. |
Harmful output | The model might generate language that leads to physical harm or language that includes overtly violent, covertly dangerous, or otherwise indirectly unsafe statements. |
Hallucination | Factually inaccurate or untruthful content regarding the model training data or input. This risk is also sometimes referred to as lack of faithfulness or lack of groundedness. |
Prompt injection | An attack that forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt. |
Jailbreaking | An attack that attempts to break through the guardrails that are established in the model to perform restricted actions. |
Output bias | Generated content might unfairly represent certain groups or individuals. |
Prompt leaking | An attempt to extract a model's system prompt |
Harmful code-generation | Models might generate code that causes harm or unintentionally affects other systems. |
Examples
You can run evaluations and generate results with the model risk evaluation engine as shown in the following examples:
Step 1: configuration
Create a model risk evaluation engine configuration:
from ibm_watsonx_gov.config.model_risk_configuration import ModelRiskConfiguration, WxGovConsoleConfiguration
configuration = ModelRiskConfiguration(
model_details = model_details,
risk_dimensions=risk_dimensions,
max_sample_size=max_sample_size,
pdf_report_output_path=pdf_report_output_path,
# wx_gc_configuration=wx_gc_configuration, # uncomment this line if the result should be pushed to Governance Console (OpenPages)
)
Step 2: run evaluation
Run an evaluation to measure risks:
from ibm_watsonx_gov.evaluate import evaluate_model_risk
evaluation_results = evaluate_model_risk(
configuration=configuration,
credentials=credentials,
)
print(evaluation_results.risks)
Step 3: generate PDF report
Export the evaluated data and metrics as a PDF report:
from ibm_wos_utils.joblib.utils.notebook_utils import create_download_link_for_file
pdf_file = create_download_link_for_file(evaluation_results.output_file_path)
display((pdf_file))
For more information, see the Model Risk Evaluation Engine notebook.
Parent topic: Metrics computation using Python SDK