Quality metrics overview

Use quality monitoring to determine how well your model predicts outcomes. When quality monitoring is enabled, it generates a set of metrics every hour by default. You can generate these metrics on demand by clicking the Check quality now button or by using the Python client.

Quality metrics are calculated based on the following information:

  • manually labeled feedback data,
  • monitored deployment responses for these data.

For proper monitoring, feedback data must be logged to Watson OpenScale on a regular basis. The feedback data can be provided either by using “Add Feedback data” option or using Python client or REST API.

For machine learning engines other than Watson OpenScale, such as Microsoft Azure ML Studio, Microsoft Azure ML Service, or Amazon Sagemaker ML quality monitoring creates additional scoring requests on the monitored deployment.

You can review all metrics values over time on the Watson OpenScale dashboard:

quality metrics chart showing drift of area under ROC

To review related details, such as confusion matrix for binary and multi-class classification, which are available for some metrics, click the chart.

detail table of quality metrics

Supported quality metrics

The following quality metrics are supported by Watson OpenScale:

Binary classification problems

For binary models, Watson OpenScale tracks when the quality of the model falls below an acceptable level. For binary classification models, it will check the Area under ROC score which measures the model’s ability to distinguish two classes. The higher the Area under ROC score, the better the model is at identifying class A as class A and class B as class B.

Regression classification problems

For regression models, Watson OpenScale tracks when the quality of the model falls below an acceptable level and checks the R squared score. R squared measures correlation between predicted values and actual values. The higher the R squared score, the better the model fits to the actual values.

Mutliclass classification problems

For multi-classification models, Watson OpenScale tracks when the quality of the model falls below an acceptable level and checks the Accuracy score which is the percentage of predictions the model got right.

Supported quality details

The following details for quality metrics are supported by Watson OpenScale:

Confusion matrix

Confusion matrix helps you to understand for which of your feedback data the monitored deployment response is correct and for which it is not.

For more information, see Confusion matrix.

Next steps

  • After Watson OpenScale detects problems with quality, such as accuracy threshold violations, you must build a new version of the model that fixes the problem. Using the manually labelled data in the feedback table, you must retrain the model along with the original training data.