Drift v2 evaluations
You can configure drift v2 evaluations to measure changes in your data over time to ensure consistent outcomes for your model. Use drift v2 evaluations to identify changes in your model output, the accuracy of your predictions, and the distribution of your input data.
The following sections describe how to configure drift v2 evaluations:
Configuring drift v2 evaluations
If you log payload data when you prepare for model evaluations, you can configure drift v2 evaluations with Watson OpenScale to help you understand how changes in your data affect model outcomes.
Compute the drift archive
You must choose the method that Watson OpenScale uses to analyze your training data to determine the data distributions of your model features. If you connect training data to Watson OpenScale and the size of your is less than 500 MB, you can choose to compute the drift v2 archive in Watson OpenScale.
If you don't connect your training data to Watson OpenScale, or if the size of your data is larger than 500 MB, you must choose to compute the drift v2 archive in a notebook. You must also compute the drift v2 archive in notebooks if you want to evaluate image or text models.
You can specify a limit for the size of your training data by setting maximum sample sizes for the amount of training data that Watson OpenScale uses for scoring and computing the drift v2 archive. For nonWatson Machine Learning deployments, computing the drift v2 archive has a cost associated with Watson OpenScale scoring the training data against your model's scoring endpoint.
Set drift thresholds
You must set threshold values for each metric to enable Watson OpenScale to understand how to identify issues with your evaluation results. The values that you set create alerts on the Insights dashboard that appear when metric scores violate your thresholds. You must set the values between the range of 0 to 1. The metric scores must be lower than the threshold values to avoid violations.
Select important features
For tabular models only, Watson OpenScale calculates feature importance to determine the impact of feature drift on your model. To enable Watson OpenScale to calculate feature importance, you can select the important and most important features from your model that have the biggest impact on your model outcomes.
When you configure SHAP explanations, Watson OpenScale automatically detects the important features by using global explanations.
You can also upload a list of important features by uploading a JSON file. Watson OpenScale provides sample snippets that you can use to upload a JSON file. For more information, see Feature importance snippets.
Set sample size
Watson OpenScale uses sample sizes to understand how to process the number of transactions that are evaluated during evaluations. You must set a minimum sample size to indicate the lowest number of transactions that you want Watson OpenScale to evaluate. You can also set a maximum sample size to indicate the maximum number of transactions that you want Watson OpenScale to evaluate.
Supported drift v2 metrics
When you enable drift v2 evaluations with Watson OpenScale, you can view a summary of evaluation results with metrics for the type of model that you're evaluating.
You can view the results of your drift v2 evaluations on the Watson OpenScale Insights dashboard. For more information, see Reviewing drift v2 results.
The following metrics are supported by drift v2 evaluations:
Output drift
Watson OpenScale calculates output drift by measuring the change in the model confidence distribution.

How it works:
Watson OpenScale measures how much your model output changes from the time that you train the model. For regression models, Watson OpenScale calculates output drift by measuring the change in distribution of predictions on the training and payload data. For classification models, Watson OpenScale calculates output drift for each class probability by measuring the change in distribution for class probabilities on the training and payload data. For multiclassification models, Watson OpenScale also aggregates output drift for each class probability by measuring a weighted average. 
Do the math:
Watson OpenScale uses the following formulas to calculate output drift:
Model quality drift
Watson OpenScale calculates model quality drift by comparing the estimated runtime accuracy to the training accuracy to measure the drop in accuracy.
 How it works:
Watson OpenScale builds its own drift detection model that processes your payload data when you configure drift v2 evaluations to predict whether your model generates accurate predictions without the ground truth. The drift detection model uses the input features and class probabilities from your model to create its own input features.
 Do the math:
Watson OpenScale uses the following formula to calculate model quality drift:
Watson OpenScale calculates the accuracy of your model as the base_accuracy
by measuring the fraction of correctly predicted transactions in your training data. During evaluations, your transactions are scored against the drift
detection model to measure the amount of transactions that are likely predicted correctly by your model. These transactions are compared to the total number of transactions that Watson OpenScale processes to calculate the predicted_accuracy
.
If the predicted_accuracy
is less than the base_accuracy
, Watson OpenScale generates a model quality drift score.
Feature drift
Watson OpenScale calculates feature drift by measuring the change in value distribution for important features.
 How it works:
Watson OpenScale calculates drift for categorical and numeric features by measuring the probability distribution of continuous and discrete values. To identify discrete values for numeric features, Watson OpenScale uses a binary logarithm to compare the number of distinct values of each feature to the total number of values of each feature. Watson OpenScale uses the following binary logarithm formula to identify discrete numeric features:
If the distinct_values_count
is less than the binary logarithm of the total_count
, the feature is identified as discrete.
 Do the math:
Watson OpenScale uses the following formulas to calculate feature drift:
The following formulas are used to calculate drift v2 evaluation metrics:
Total variation distance
Total variation distance measures the maximum difference between the probabilities that two probability distributions, baseline (B) and production (P), assign to the same transaction as shown in the following formula:
If the two distributions are equal, the total variation distance between them becomes 0.
The following formula is used to calculate total variation distance:

𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.

is the difference between two consecutive 𝑥 samples.

is the value of the density function for production data at a 𝑥 sample.

is the value of the density function for baseline data for at a 𝑥 sample.
The denominator represents the total area under the density function plots for production and baseline data. These summations are an approximation of the integrations over the domain space and both these terms should be 1 and total should be 2.
Overlap coefficient
The overlap coefficient is calculated by measuring the total area of the intersection between two probability distributions. To measure dissimilarity between distributions, the intersection or the overlap area is subtracted from 1 to calculate the amount of drift. The following formula is used to calculate the overlap coefficient:

𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.

is the difference between two consecutive 𝑥 samples.

is the value of the density function for production data at a 𝑥 sample.

is the value of the density function for baseline data for at a 𝑥 sample.
Jensen Shannon distance
Jensen Shannon Distance is the normalized form of KullbackLeibler (KL) Divergence that measures how much one probability distribution differs from the second probabillity distribution. Jensen Shannon Distance is a symmetrical score and always has a finite value.
The following formula is used to calculate the Jensen Shannon distance for two probability distributions, baseline (B) and production (P):
is the KL Divergence.
Parent topic: Configuring model evaluations