Watson OpenScale quality metrics

Last updated: Nov 03, 2023

When you enable quality evaluations in Watson OpenScale, you can generate metrics that help you determine how well your model predicts outcomes.

You can view the results of your quality evaluations on the Insights dashboard in Watson OpenScale. To view results, you can select a model deployment tile and click the arrow in the Quality evaluation section to display a summary of quality metrics from your last evaluation. For more information, see Reviewing quality results.

Quality metrics are calculated with manually labeled feedback data and monitored deployment responses. For more information, see Managing feedback data.

Supported quality metrics

The following quality metrics are supported by Watson OpenScale:

Binary classification problems

For binary models, Watson OpenScale tracks when the quality of the model falls below an acceptable level. For binary classification models, it checks the Area under ROC score, which measures the model's ability to distinguish between two classes. For example, the models with higher Area under ROC scores are better at identifying class A as class A and class B as class B. The following metrics measure binary classification problems:

Regression classification problems

For regression models, Watson OpenScale tracks when the quality of the model falls below an acceptable level and checks the R-squared score. The R-squared score measures the correlation between predicted values and actual values. For example, models with higher R-squared scores fit to the actual values better. The following metrics measure regression classification problems:

Multiclass classification problems

For multi-classification models, Watson OpenScale tracks when the quality of the model falls under an acceptable level and checks the Accuracy score that provides the percentage of accurate predictions. The following metrics measures multiclass classification problems:

Note:

After Watson OpenScale detects problems with quality, such as accuracy threshold violations, you must build a new version of the model that fixes the problem. Using the manually labeled data in the feedback table, you must retrain the model along with the original training data.