Drift evaluations | IBM Cloud Pak for Data as a Service

Drift evaluations

Drift evaluations detect drops in accuracy and data consistency in a model. The model accuracy drops if there is an increase in transactions similar to those that the model did not evaluate correctly in the training data.

Drift evaluation examples

When configuring drift in, you must specify the tolerable accuracy drift magnitude. The drift is measured as the drop in accuracy as compared to the model accuracy at training time. For example, if the model accuracy at training time was 90% and at runtime the estimated accuracy of the model is 80%, then the model is said to have drifted by 10%. Depending on the use case, model owners will be willing to tolerate different amounts of drift. You can specify the accuracy drift magnitude for each model evaluation. If the drift for a model drops below the specified threshold, an alert will be generated.

If your data does not exceed 500 MB, you can train your model online. Otherwise, you must use a notebook to train the model.

Before you begin

You must configure drift detection before it can analyze your model. You can train your drift detection model online by using the user interface or by running code inside a notebook. Drift configuration is supported for structured data only. The classification models support both data and accuracy drift, regression models support only data drift.

These are the requirements for configuring the Drift monitor:

The Machine Learning Provider must be Watson Machine Learning
The training data size must be less than 500MB
The training data must be hosted in IBM Cloud Object Storage/Db2.

To upload the training data and set the Model details for drift detection:

Click Upload training data and upload a file with the labeled data.

For details, see Providing model details.

Throughout this process, your model is analyzed and recommendations are made based on the most logical outcome. For drift detection to work properly, the data type of your prediction column in the training data must match the data type of the same column in the payload data. Assign matching string or numeric types to the prediction and label columns. To confirm data types, click Model details > Model output details > Edit. These selections ensure that you have accurate information for the following configuration steps. If for some reason you must change data types, you must redeploy the evaluation to effect the changes.

On the successive pages of the Drift tab, you must provide the following information:

Alert threshold

Required only for classification type models: The degree of change in model accuracy is compared to accuracy at training time. The alert threshold, which must be at least 5%, indicates the degree of tolerance for change over time.

Sample size

By setting a minimum sample size, you prevent measuring drift until a minimum number of records are available in the evaluation data set. This setting ensures that the sample size is not too small to skew results. Every time drift checking runs, it uses the minimum sample size to decide the number of records on which it does the computation.

Steps to configure drift evaluation

If you use IBM Watson Machine Learning, you can configure drift detection.

To start the configuration process, from the Drift tab, in the Drift model box, click the Edit icon.

Choose a training option and follow the prompts to enter required information. When you finish, a summary of your selections is presented for review. If you want to change anything, click the Edit icon for that section. Otherwise, save your work.

Steps to configure drift without retraining

Reconfigure the drift evaluation without retraining the drift model to update parameters without more processing. You update the minimum sample size and threshold to produce more data on the currently trained model without incurring more processing costs. It is one way to avoid intensive CPU usage when the underlying data is stable and you want to view drift magnitude with different thresholds.

Note: Your drift model requires retraining only when training data or schema changes.

To start the configuration process, from the Drift tab, in the Drift threshold box or Sample size box, click Edit . Update the current setting and save it.

Steps to configure drift by using a notebook

Use a notebook to configure drift in the following circumstances:

You do not want to share the training data to configure drift evaluations
You do not have a means to share the training data on Db2 or IBM Cloud Object Storage, which are the only two training data locations that are supported for drift evaluations.

This option is useful if the training data is not stored in Db2 or IBM Cloud Object Storage. Using a notebook, you must read the training data into a dataframe. The specialized notebook that you can download then creates a specialized output that you can upload to configure drift evaluations.

To generate the drift detection model, you can run the cell that installs the ibm-wos-utils>=5.0.1.0 package and sci-kit learn version 1.3.2. Scikit-learn version 1.3.2 is required to build the model.

Create a notebook to generate the drift detection model by using the sample notebook. The drift detection model is converted into a .tar.gz file for you.

To start the configuration process, from the Drift tab, in the Drift model box, click Edit . Use the Train in a data science notebook option. You can drag your compressed drift detection model to the drop zone.

Follow the prompts and enter required information. When you finish, a summary of your selections is presented for review. If you want to change anything, click Edit for that section. Otherwise, save your work.

Learn more

Drift metrics

Parent topic: Configuring model evaluations