0 / 0
Configuring drift v2 evaluations in Watson OpenScale

Configuring drift v2 evaluations in Watson OpenScale

You can configure drift v2 evaluations with Watson OpenScale to measure changes in your data over time to ensure consistent outcomes for your model. Use drift v2 evaluations to identify changes in your model output, the accuracy of your predictions, and the distribution of your input data.

If you log payload data when you prepare for model evaluations, you can configure drift v2 evaluations to help you understand how changes in your data affect model outcomes.

The following sections describe the steps that you must complete to configure drift v2 evaluations with Watson OpenScale:

Compute the drift archive

You must choose the method that Watson OpenScale uses to analyze your training data to determine the data distributions of your model features. If you connect training data to Watson OpenScale and the size of your is less than 500 MB, you can choose to compute the drift v2 archive in Watson OpenScale. If you don't connect your training data to Watson OpenScale, or if the size of your data is larger than 500 MB, you must choose to compute the drift v2 archive in a notebook.

You can specify a limit for the size of your training data by setting maximum sample sizes for the amount of training data that Watson OpenScale uses for scoring and computing the drift v2 archive. For non-Watson Machine Learning deployments, computing the drift v2 archive has a cost associated with Watson OpenScale scoring the training data against your model's scoring endpoint.

Set drift thresholds

You must set threshold values for each metric to enable Watson OpenScale to understand how to identify issues with your evaluation results. The values that you set create alerts on the Insights dashboard that appear when metric scores violate your thresholds. You must set the values between the range of 0 to 1. The metric scores must be lower than the threshold values to avoid violations.

Select important features

Watson OpenScale calculates feature importance to determine the impact of feature drift on your model. To enable Watson OpenScale to calculate feature importance, you can select the important and most important features from your model that have the biggest impact on your model outcomes.

When you configure SHAP explanations, Watson OpenScale automatically detects the important features by using global explanations.

You can also upload a list of important features by uploading a JSON file. Watson OpenScale provides sample snippets that you can use to upload a JSON file. For more information, see Feature importance snippets.

Set sample size

Watson OpenScale uses sample sizes to understand how to process the number of transactions that are evaluated during evaluations. You must set a minimum sample size to indicate the lowest number of transactions that you want Watson OpenScale to evaluate. You can also set a maximum sample size to indicate the maximum number of transactions that you want Watson OpenScale to evaluate.

Supported drift v2 metrics

When you enable drift v2 evaluations, you can view a summary of evaluation results with metrics for the type of model that you're evaluating.

To view results, you can select a model deployment tile and click the arrow navigation arrow in the Drift v2 evaluation section to display a summary of drift v2 metrics from your last evaluation. For more information, see Reviewing evaluation results.

Drift v2 metrics are calculated with the payload data that you provide to Watson OpenScale. For more information, see Managing payload data.

The following drift v2 metrics are supported by Watson OpenScale:


Output drift

Watson OpenScale calculates output drift by measuring the change in the model confidence distribution.

  • How it works:
    Watson OpenScale measures how much your model output changes from the time that you train the model. For regression models, Watson OpenScale calculates output drift by measuring the change in distribution of predictions on the training and payload data. For classification models, Watson OpenScale calculates output drift for each class probability by measuring the change in distribution for class probabilities on the training and payload data. For multi-classification models, Watson OpenScale also aggregates output drift for each class probability by measuring a weighted average.

  • Do the math:
    Watson OpenScale uses the following formulas to calculate output drift:



Model quality drift

Watson OpenScale calculates model quality drift by comparing the estimated runtime accuracy to the training accuracy to measure the drop in accuracy.

  • How it works:

Watson OpenScale builds its own drift detection model that processes your payload data when you configure drift v2 evaluations to predict whether your model generates accurate predictions without the ground truth. The drift detection model uses the input features and class probabilities from your model to create its own input features.

  • Do the math:

Watson OpenScale uses the following formula to calculate model quality drift:

model quality score

Watson OpenScale calculates the accuracy of your model as the base_accuracy by measuring the fraction of correctly predicted transactions in your training data. During evaluations, your transactions are scored against the drift detection model to measure the amount of transactions that are likely predicted correctly by your model. These transactions are compared to the total number of transactions that Watson OpenScale processes to calculate the predicted_accuracy. If the predicted_accuracy is less than the base_accuracy, Watson OpenScale generates a model quality drift score.



Feature drift

Watson OpenScale calculates feature drift by measuring the change in value distribution for important features.

  • How it works:

Watson OpenScale calculates drift for categorical and numeric features by measuring the probability distribution of continuous and discrete values. To identify discrete values for numeric features, Watson OpenScale uses a binary logarithm to compare the number of distinct values of each feature to the total number of values of each feature. Watson OpenScale uses the following binary logarithm formula to identify discrete numeric features:

Binary logarithm formula is displayed

If the distinct_values_count is less than the binary logarithm of the total_count, the feature is identified as discrete.

  • Do the math:

Watson OpenScale uses the following formulas to calculate feature drift:


The following formulas are used to calculate drift v2 evaluation metrics:

Total variation distance

Total variation distance measures the maximum difference between the probabilities that two probability distributions, baseline (B) and production (P), assign to the same transaction as shown in the following formula:

Probability distribution formula is displayed

If the two distributions are equal, the total variation distance between them becomes 0.

The following formula is used to calculate total variation distance:

Total variation distance formula is displayed

  • 𝑥 is a series of equidistant samples that span the domain of circumflex f is displayed that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.

  • d(x) symbol is displayed is the difference between two consecutive 𝑥 samples.

  • explanation of formula is the value of the density function for production data at a 𝑥 sample.

  • explanation of formula is the value of the density function for baseline data for at a 𝑥 sample.

The explanation of formula denominator represents the total area under the density function plots for production and baseline data. These summations are an approximation of the integrations over the domain space and both these terms should be 1 and total should be 2.

Overlap coefficient

The overlap coefficient is calculated by measuring the total area of the intersection between two probability distributions. To measure dissimilarity between distributions, the intersection or the overlap area is subtracted from 1 to calculate the amount of drift. The following formula is used to calculate the overlap coefficient:

Overlap coefficient formula is displayed

  • 𝑥 is a series of equidistant samples that span the domain of circumflex f is displayed that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.

  • d(x) symbol is displayed is the difference between two consecutive 𝑥 samples.

  • explanation of formula is the value of the density function for production data at a 𝑥 sample.

  • explanation of formula is the value of the density function for baseline data for at a 𝑥 sample.

Jensen Shannon distance

Jensen Shannon Distance is the normalized form of Kullback-Liebler (KL) Divergence that measures how much one probability distribution differs from the second probabillity distribution. Jensen Shannon Distance is a symmetrical score and always has a finite value.

The following formula is used to calculate the Jensen Shannon distance for two probability distributions, baseline (B) and production (P):

Jensen Shannon distance formula is displayed

KL Divergence is displayed is the KL Divergence.

Parent topic: Configuring model evaluations

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more