0 / 0
Drift metrics
Last updated: Oct 25, 2024
Drift metrics

When you configure drift evaluations, you can generate metrics that help you determine how well your model predicts outcomes over time.

You can view the results of your drift evaluations on the Insights dashboard. To view results, you can select a model deployment tile and click the arrow navigation arrow in the Drift evaluation section to display a summary of drift metrics from your last evaluation. For more information, see Reviewing drift results.

Drift metrics are calculated by analyzing the behavior of your model and building its own model to predict whether your model generates accurate predictions for data points. This drift detection model processes payload data to identify the number of records that your model makes inaccurate predictions for and generates the predicted accuracy of your model.

Drift is supported for structured data only and does not support Python functions.

Supported drift metrics

The following drift metrics are supported for drift evaluations:

Drop in accuracy

Drift evaluations estimate the drop in accuracy of your model at run time when compared to the training data. The model accuracy drops if there is an increase in transactions similar to those that the model did not evaluate correctly in the training data.

How it works

The drift monitor works differently in pre-production and production environments.

In pre-production environments, when you upload labeled test data, the data is added to the feedback and payload tables. The labeled data is added as an annotation in the payload table. Accuracy is calculated with the labeled data column and the prediction column from the payload table.

In production environments, A drift detection model is created by looking at the data that was used to train and test the model. For example, if the model has an accuracy of 90% on the test data, it means that it provides incorrect predictions on 10% of the test data. A binary classification model is built that accepts a data point and predicts whether that data point is similar to the data that the model either incorrectly (10%) or accurately (90%) predicted.

After the drift detection model is created, at run time, this model is scored by using all of the data that the client model receives. For example, if the client model received 1000 records in the past 3 hours, the drift detection model runs on those same 1000 data points. It calculates how many of the records are similar to the 10% of records on which the model made an error when training. If 200 of these records are similar to the 10%, then it implies that the model accuracy is likely to be 80%. Because the model accuracy at training time was 90%, it means that there is an accuracy drift of 10% in the model.

To mitigate drift after it is detected, you must build a new version of the model that fixes the problem. A good place to start is with the data points that are highlighted as reasons for the drift. Introduce the new data to the predictive model after you manually label the drifted transactions and use them to retrain the model.

Do the math

The drop in accuracy metric is calculated for structured binary and multi-class classification models only. Each transaction is analyzed to estimate if the model prediction is accurate. If the model prediction is inaccurate, the transaction is marked as drifted. The estimated accuracy is then calculated as the fraction of nondrifted transactions to the total number of transactions analyzed. The base accuracy is the accuracy of the model on the test data. The extent of the drift in accuracy is calculated as the difference between Base accuracy and Estimated accuracy. Further, all the drifted transactions are calculated and then, groups transactions based on the similarity of each feature's contribution to the drift in accuracy. In each cluster, the important features are estimated that contributed to the drift in accuracy and classifies their feature impact as large, some, and small.

Drop in data consistency

How it works

Each transaction is analyzed for data inconsistency by comparing the run-time transactions with the patterns of the transactions in the training data. If a transaction violates one or more of the training data patterns, the transaction is identified as inconsistent. To calculate the drop in data consistency, the total number of transactions is divided by the number of transactions that are identified as inconsistent. For example, if 10 transactions are identified as inconsistent from a set of 100 transactions, then the drop in data consistency is 10%.

Do the math

To identify data inconsistency, A schema is generated when you configure drift detection by creating a constraints.json file to specify the rules that your input data must follow. The schema is used to evaluate your data for drift by identifying outliers that do not fit within the constraints that are specified. The schema is specified as a JSON object with columns and constraints arrays that describe the training data as shown in the following example:

{
      "columns": [
        {
            "name": "CheckingStatus",
            "dtype": "categorical",
            "count": 5000,
            "sparse": false,
            "skip_learning": false
        },
      "constraints": [
        {
            "name": "categorical_distribution_constraint",
            "id": "f0476d40-d7df-4095-9be5-82564511432c",
            "kind": "single_column",
            "columns": [
                "CheckingStatus"
            ],
            "content": {
                "frequency_distribution": {
                    "0_to_200": 1304,
                    "greater_200": 305,
                    "less_0": 1398,
                    "no_checking": 1993
                }
            }
        }

Columns

Values are specified for the name, dtype, count,sparse, and skip_learning keys to describe a column.

The name and dtype keys describe the label and the data type for a column. The following values that are specified with the dtype key describe the data type:

  • categorical
  • numeric_discrete
  • numeric_continuous

The data type that is specified determines if more statistical properties are described with keys, such as min, max, and mean. For example, when the numeric_discrete or the numeric_continuous data type is specified, properties are described as shown in the following example:

{
            "name": "LoanDuration",
            "dtype": "numeric_discrete",
            "count": 5000,
            "sparse": false,
            "skip_learning": false,
            "min": 4,
            "max": 53,
            "mean": 21.28820697954272,
            "std": 10.999096037050032,
            "percentiles": [
                13.0,
                21.0,
                29.0
            ],
            "count_actual": 4986
        }

The count key specifies the number of rows for a column. Boolean values are specified to describe the sparse and skip_learning keys for a column. The sparse key specifies whether a column is sparse and the skip_learning key specifies whether a column skips learning any of the rules that are described in the schema. A column is sparse if the 25th and 75th percentiles have the same value.

Constraints

The name key specifies the constraint type. The following values are specified to describe the constraint type:

  • categorical_distribution_constraint
  • numeric_range_constraint
  • numeric_distribution_constraint
  • catnum_range_constraint
  • catnum_distribution_constraint
  • catcat_distribution_constraint

The id key identifies constraints with a universally unique identifier (UUID). The kind key specifies whether the constraint is a single_column or two-column constraint.

The columns key specifies an array of column names. When a single_column constraint with the kind key is specified, the array contains a value that correlates with the column that you want to describe. When a two-column constraint with the kind key is specified, the array contains values that correlate with columns that contain related data.

The content key specifies attributes that describe the statistical characteristics of your data. The constraint type that is specified with the name key determines which attribute is specified in the content key as shown in the following table:

Attribute Constraints
frequency_distribution categorical_distribution_constraint
ranges numeric_range_constraint, catnum_range_constraint
distribution numeric_distribution_constraint, catnum_distribution_constraint
rare_combinations catcat_distribution_constraint
source_column catcat_distribution_constraint, catnum_range_constraint, catnum_distribution_constraint
target_column catcat_distribution_constraint, catnum_range_constraint, catnum_distribution_constraint

The following sections provide examples of how each constraint type is specified:

Categorical distribution constraint

        {
            "name": "categorical_distribution_constraint",
            "id": "f0476d40-d7df-4095-9be5-82564511432c",
            "kind": "single_column",
            "columns": [
                "CheckingStatus"
            ],
            "content": {
                "frequency_distribution": {
                    "0_to_200": 1304,
                    "greater_200": 305,
                    "less_0": 1398,
                    "no_checking": 1993
                }
            }
        }

In the training data, the CheckingStatus column contains four values that are specified with the frequency_distribution attribute. The frequency_distribution attribute specifies the frequency counts with values for categories, such as 0_to_200. If records are found in the payload data that specifies values that are different than the frequency_distribution attribute values, the records are identified as drift.

Numeric range constraint

   {
            "name": "numeric_range_constraint",
            "id": "79f3a1f5-30a1-4c7f-91a0-1613013ee802",
            "kind": "single_column",
            "columns": [
                "LoanAmount"
            ],
            "content": {
                "ranges": [
                    {
                        "min": 250,
                        "max": 11676,
                        "count": 5000
                    }
                ]
            }
        }

The LoanAmount column contains minimum and maximum values that are specified with the ranges attribute to set a range for the training data. The ranges attribute specifies the high-density regions of the column. Any ranges that rarely occur in the training data aren't included. If records are found in the payload data that do not fit within the range and a pre-defined buffer, the records are identified as drift.

Numeric distribution constraint

{
            "name": "numeric_distribution_constraint",
            "id": "3a97494b-0cd7-483e-a1c6-adb7755c1cb0",
            "kind": "single_column",
            "columns": [
                "LoanAmount"
            ],
            "content": {
                "distribution": {
                        "name": "norm",
                        "parameters": {
                            "loc": 3799.62,
                            "scale": 1920.0640064678398
                        },
                        "p-value": 0.22617155797563282
                }
            }
        }

The LoanAmount column contains values that are specified with the distribution attribute to set a normal distribution for the training data. If records are found in the payload data that do not fit within the normal distribution, the records are identified as drift. The distributions that are fitted within are uniform, exponential, or normal distributions. If records that fit within these distributions are not found, this constraint is not learned.

Categorical- categorical distribution constraint

    {
            "name": "catcat_distribution_constraint",
            "id": "99468600-1924-44d9-852c-1727c9c414ee",
            "kind": "two_column",
            "columns": [
                "CheckingStatus",
                "CreditHistory"
            ],
            "content": {
                "source_column": "CheckingStatus",
                "target_column": "CreditHistory",
                "rare_combinations": [
                    {
                        "source_value": "no_checking",
                        "target_values": [
                            "no_credits"
                        ]
                    }
                ]
            }
        }

For the CheckingStatus and CreditHistory columns, the rare_combinations attributes specifies a combination of values that rarely occur in the training data. If records are found in the payload data that contain the combination, the records are identified as drift.

Categorical- numeric range constraint

        {
            "name": "catnum_range_constraint",
            "id": "f252033c-1635-4974-8976-3f7904d0c37d",
            "kind": "two_column",
            "columns": [
                "CheckingStatus",
                "LoanAmount"
            ],
            "content": {
                "source_column": "CheckingStatus",
                "target_column": "LoanAmount",
                "ranges": {
                    "no_checking": [
                        {
                            "min": 250,
                            "max": 11676,
                            "count": 1993
                        }
                    ],
                    "less_0": [
                        {
                            "min": 250,
                            "max": 7200,
                            "count": 1398
                        }
                    ],
                    "0_to_200": [
                        {
                            "min": 250,
                            "max": 9076,
                            "count": 1304
                        }
                    ],
                    "greater_200": [
                        {
                            "min": 250,
                            "max": 9772,
                            "count": 305
                        }
                    ]
                }
            }
        }

The ranges attribute specifies minimum and maximum values for the CheckingStatus and LoanAmount columns that set a range for the training data. If records are found in the payload data that don't contain LoanAmount and CheckingStatus column values that fit within the range and a pre-defined buffer, the records are identified as drift.

Categorical- numeric distribution constraint

        {
            "name": "catnum_distribution_constraint",
            "id": "3a97494b-0cd7-483e-a1c6-adb7755c1cb0",
            "kind": "two_column",
            "columns": [
                "CheckingStatus",
                "LoanAmount"
            ],
            "content": {
                "source_column": "CheckingStatus",
                "target_column": "LoanAmount",
                "distribution": {
                    "greater_200": {
                        "name": "norm",
                        "parameters": {
                            "loc": 3799.62,
                            "scale": 1920.0640064678398
                        },
                        "p-value": 0.22617155797563282
                    }
                }
            }
        }

The LoanAmount and CheckingStatus columns contain values that are specified with the distribution attribute to set a normal distribution for the training data. If records are found in the payload data that don't contain LoanAmount and CheckingStatus column values that fit within the normal distribution, the records are identified as drift.

Note:

To mitigate drift after it is detected, you must build a new version of the model that fixes the problem. A good place to start is with the data points that are highlighted as reasons for the drift. Introduce the new data to the predictive model after you manually label the drifted transactions and use them to retrain the model.

Learn more

Reviewing model insights

Parent topic: Drift metrics

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more