0 / 0
Reviewing evaluation results

Reviewing evaluation results

When you configure evaluations in the Watson OpenScale service, you can analyze evaluation results to gain insights about your model performance. A dashboard provides the tools for reviewing performance details, sharing information about alerts, or printing reports.

Some of the details you can review from the dashboard include:

  • Review quality results to see a confusion matrix that helps you determine whether your deployed model analyzed your transactions correctly.
  • View drift results to see the transactions that are responsible for a drop in accuracy, a drop in data consistency, or both.
  • Inspect model health evaluation results, where you can see a summary of the metrics that are generated during your last evaluation with scorecard tiles that correlate with different dimensions.

Model deployment evaluation chart is displayed with each evaluation showing details for how the model meets set thresholds.

To view results in the Insights dashboard:

  1. In Watson Openscale, click the Activity icon activity icon to open the Insights Dashboard.

  2. Select the deployment model tile you want to view results. Watson OpenScale displays the results from your last evaluation.

  3. Click the arrow navigation arrow in an evaluation section to view data visualizations of evaluation results within the timeframe and Date range settings that you specify. The last evaluation for the timeframe that you select is also displayed during the associated data range.

  4. Use the Actions menu to view details about your model by selecting any of the following analysis options:

    • All evaluations: For pre-production models, display a history of your evaluations to understand how your results change over time.
    • Compare: Compare models with a matrix chart that highlights key metrics to help you determine which version of a model is ready for production or which models might need more training.
    • View model information: View details about your model to understand how your deployment environment is set up.
    • Download report PDF: Generate a model summary report that provides which gives you all of the metrics and the explanation for why they were scored the way they were.
    • Set up alert: Send alerts about threshold violations to an email address.

You can also use the Actions menu to manage data for model evaluations. For more information, see Sending model transactions.

With time series charts, Watson OpenScale displays aggregated evaluations as data points that you can select to view results for a specific time. The timestamp of each datapoint that displays when you hover on time series charts does not match the timestamp of the latest evaluation due to the default Watson OpenScale aggregation behaviour.

Analyzing results

The following sections describe how you can analyze results from your Watson OpenScale model evaluations:

  • To help you review fairness results, Watson OpenScale provides calculations for the following types of data sets:

    • Balanced: Balanced calculation includes the scoring request that is received for the selected hour. The calculation also includes more records from previous hours if the minimum number of records that are required for evaluation was not met. Includes more perturbed and synthesized records that are used to test the model's response when the value of the monitored feature changes.
    • Payload: The actual scoring requests that are received by the model for the selected hour.
    • Training: The training data records that are used to train the model.
    • Debiased: The output of the debiasing algorithm after processing the runtime and perturbed data.

    data visualization of fairness metrics for each monitored group

    With the chart, you can observe the groups that experience bias and see the percentage of expected outcomes for these groups. You can also see the percentage of expected outcomes for reference groups, which is the average of expected outcomes across all reference groups. The charts indicates the presence of bias by comparing the ratio of the percentage of expected outcomes for monitored groups in a data range to the percentage of outcomes for reference groups.

    The chart also shows the distribution of the reference and monitored values for each distinct value of the attribute in the data from the payload table that was analyzed to identify bias. The distribution of the payload data is shown for each distinct value of the attributes. You can use this data to correlate the amount of bias with the amount of data that is received by the model. You can also see the percentage of groups with expected outcomes to identify sources of bias that skewed results and led to increases in the percentage of expected outcomes for reference groups.

  • To help you review quality results, Watson OpenScale displays a confusion matrix to help you determine whether your deployed model analyzed your transactions incorrectly. For binary classification models, the records are classified as false positives or false negatives and as incorrect class assignments for multi-class models. For binary classification problems, IBM Watson OpenScale assigns the target category to either the positive or negative level. In the confusion matrix, the label for the positive category is located in the second row or column.

    detail table of quality metrics

  • For drift evaluations, you can view the transactions that are responsible for a drop in accuracy, a drop in data consistency, or both. You can also view the number of transactions that are identified and the features of your model that are responsible for reduced accuracy or data consistency.

    Model drift transactions page is displayed

    For more information, see Reviewing drift transactions.

  • When you review drift v2 evaluation results, Watson OpenScale displays collapsible tiles that you can open to view different details about the metrics. You can view the history of how each metric score changes over time with a time series chart or view details how the scores output and feature drifts are calculated. You can also view details about each feature to understand how they contribute to the scores that Watson OpenScale generates.

    Drift v2 evaluation results are displayed

  • When you review model health evaluation results, Watson OpenScale provides a summary of the metrics that are generated during your last evaluation with scorecard tiles that correlate with different dimensions. For metrics with multiple dimensions, you can click a dropdown menu on the tiles to select the metric that you want to analyze. To analyze how your metrics change over time, you can click the collapsible tiles for each category to view timeseries charts.

    Model health metrics are displayed

For more information, see Model health evaluation metrics.

Parent topic: Getting insights with Watson OpenScale

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more