Configuring drift v2 evaluations in Watson OpenScale
You can configure drift v2 evaluations with Watson OpenScale to measure changes in your data over time to ensure consistent outcomes for your model. Use drift v2 evaluations to identify changes in your model output, the accuracy of your predictions, and the distribution of your input data.
If you log payload data when you prepare for model evaluations, you can configure drift v2 evaluations to help you understand how changes in your data affect model outcomes.
The following sections describe the steps that you must complete to configure drift v2 evaluations with Watson OpenScale:
Compute the drift archive
You must choose the method that Watson OpenScale uses to analyze your training data to determine the data distributions of your model features. If you connect training data to Watson OpenScale and the size of your is less than 500 MB, you can choose to compute the drift v2 archive in Watson OpenScale. If you don't connect your training data to Watson OpenScale, or if the size of your data is larger than 500 MB, you must choose to compute the drift v2 archive in a notebook.
You can specify a limit for the size of your training data by setting maximum sample sizes for the amount of training data that Watson OpenScale uses for scoring and computing the drift v2 archive. For nonWatson Machine Learning deployments, computing the drift v2 archive has a cost associated with Watson OpenScale scoring the training data against your model's scoring endpoint.
Set drift thresholds
You must set threshold values for each metric to enable Watson OpenScale to understand how to identify issues with your evaluation results. The values that you set create alerts on the Insights dashboard that appear when metric scores violate your thresholds. You must set the values between the range of 0 to 1. The metric scores must be lower than the threshold values to avoid violations.
Select important features
Watson OpenScale calculates feature importance to determine the impact of feature drift on your model. To enable Watson OpenScale to calculate feature importance, you can select the important and most important features from your model that have the biggest impact on your model outcomes.
When you configure SHAP explanations, Watson OpenScale automatically detects the important features by using global explanations.
You can also upload a list of important features by uploading a JSON file. Watson OpenScale provides sample snippets that you can use to upload a JSON file. For more information, see Feature importance snippets.
Set sample size
Watson OpenScale uses sample sizes to understand how to process the number of transactions that are evaluated during evaluations. You must set a minimum sample size to indicate the lowest number of transactions that you want Watson OpenScale to evaluate. You can also set a maximum sample size to indicate the maximum number of transactions that you want Watson OpenScale to evaluate.
Supported drift v2 metrics
When you enable drift v2 evaluations, you can view a summary of evaluation results with metrics for the type of model that you're evaluating.
To view results, you can select a model deployment tile and click the arrow in the Drift v2 evaluation section to display a summary of drift v2 metrics from your last evaluation. For more information, see Reviewing evaluation results.
Drift v2 metrics are calculated with the payload data that you provide to Watson OpenScale. For more information, see Managing payload data.
The following drift v2 metrics are supported by Watson OpenScale:

Watson OpenScale calculates output drift by measuring the change in the model confidence distribution.

How it works:
Watson OpenScale measures how much your model output changes from the time that you train the model. For regression models, Watson OpenScale calculates output drift by measuring the change in distribution of predictions on the training and payload data. For classification models, Watson OpenScale calculates output drift for each class probability by measuring the change in distribution for class probabilities on the training and payload data. For multiclassification models, Watson OpenScale also aggregates output drift for each class probability by measuring a weighted average. 
Do the math:
Watson OpenScale uses the following formulas to calculate output drift:


Watson OpenScale calculates model quality drift by comparing the estimated runtime accuracy to the training accuracy to measure the drop in accuracy.
 How it works:
Watson OpenScale builds its own drift detection model that processes your payload data when you configure drift v2 evaluations to predict whether your model generates accurate predictions without the ground truth. The drift detection model uses the input features and class probabilities from your model to create its own input features.
 Do the math:
Watson OpenScale uses the following formula to calculate model quality drift:
Watson OpenScale calculates the accuracy of your model as the
base_accuracy
by measuring the fraction of correctly predicted transactions in your training data. During evaluations, your transactions are scored against the drift detection model to measure the amount of transactions that are likely predicted correctly by your model. These transactions are compared to the total number of transactions that Watson OpenScale processes to calculate thepredicted_accuracy
. If thepredicted_accuracy
is less than thebase_accuracy
, Watson OpenScale generates a model quality drift score.

Watson OpenScale calculates feature drift by measuring the change in value distribution for important features.
 How it works:
Watson OpenScale calculates drift for categorical and numeric features by measuring the probability distribution of continuous and discrete values. To identify discrete values for numeric features, Watson OpenScale uses a binary logarithm to compare the number of distinct values of each feature to the total number of values of each feature. Watson OpenScale uses the following binary logarithm formula to identify discrete numeric features:
If the
distinct_values_count
is less than the binary logarithm of thetotal_count
, the feature is identified as discrete. Do the math:
Watson OpenScale uses the following formulas to calculate feature drift:
watsonx.governance uses the following formulas to calculate drift v2 evaluation metrics:
Total variation distance
Total variation distance measures the maximum difference between the probabilities that two probability distributions, baseline (B) and production (P), assign to the same transaction as shown in the following formula:
If the two distributions are equal, the total variation distance between them becomes 0.
watsonx.governance uses the following formula to calculate total variation distance:

𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.

is the difference between two consecutive 𝑥 samples.

is the value of the density function for production data at a 𝑥 sample.

is the value of the density function for baseline data for at a 𝑥 sample.
The denominator represents the total area under the density function plots for production and baseline data. These summations are an approximation of the integrations over the domain space and both these terms should be 1 and total should be 2.
Overlap coefficient
watsonx.governance calculates the overlap coefficient by measuring the total area of the intersection between two probability distributions. To measure dissimilarity between distributions, the intersection or the overlap area is subtracted from 1 to calculate the amount of drift. watsonx.governance uses the following formula to calculate the overlap coefficient:

𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.

is the difference between two consecutive 𝑥 samples.

is the value of the density function for production data at a 𝑥 sample.

is the value of the density function for baseline data for at a 𝑥 sample.
Jensen Shannon distance
Jensen Shannon Distance is the normalized form of KullbackLiebler (KL) Divergence that measures how much one probability distribution differs from the second probabillity distribution. Jensen Shannon Distance is a symmetrical score and always has a finite value.
watsonx.governance uses the following formula to calculate the Jensen Shannon distance for two probability distributions, baseline (B) and production (P):
is the KL Divergence.
Parent topic: Configuring model evaluations