Quick start: Evaluate a machine learning model

Last updated: Nov 27, 2024

Take this tutorial to learn how to evaluate a model that predicts which applicants qualify for mortgages. You must evaluate the model for quality, fairness, and explainability.

Required services: watsonx.ai Runtime; watsonx.ai Studio; Watson OpenScale

Your basic workflow includes these tasks:

Open a project. Projects are where you can collaborate with others to work with data and models.
Build a model. You can build a model by using one of these methods:
- Create a Jupyter notebook and add Python code.
- Create an AutoAI experiment.
Deploy your model.
Configure the Watson OpenScale monitors by using one of these methods:
- Create a Jupyter notebook.
- Add the deployment to the Watson OpenScale dashboard using the UI.

Read about Watson OpenScale

Watson OpenScale tracks and measures outcomes from your AI models, and helps ensure that they remain fair, explainable, and compliant no matter where your models were built or are running. Watson OpenScale also detects and helps correct the drift in accuracy when an AI model is in production.

Watch a video about evaluating a machine learning model

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.

Try a tutorial about evaluating a machine learning model

In this tutorial, you will complete these tasks:

Task 1: Create the sample project
Task 2: Deploy the model
Task 3: Run the notebook to set up the monitors
Task 4: Evaluate the model
Task 5: Observe the model monitors for quality
Task 6: Observe the model monitors for fairness
Task 7: Observe the model monitors for drift
Task 8: Observe the model monitors for explainability

Tips for completing this tutorial

Here are some tips for successfully completing this tutorial.

Use the video picture-in-picture

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

The following animated image shows how to use the video picture-in-picture and table of contents features:

How to use picture-in-picture and chapters

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.

Task 1: Create the sample project

preview tutorial video To preview this task, watch the video beginning at 00:06.

This tutorial uses a sample project containing a machine learning model and a notebook to configure the monitors. Follow these steps to create a project based on a sample.

Access the Evaluate an ML model sample project in the Resource hub.
Click Create project.
If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.
Click Create.
Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.
Click the Assets tab to view the assets in the sample project.

Check your progress

The following image shows the sample project. You are now ready to start the tutorial.

Task 2: Deploy the model

Before you can deploy the model, you need to promote the model to a new deployment space. Deployment spaces help you to organize supporting resources such as input data and environments; deploy models or functions to generate predictions or solutions; and view or edit deployment details.

Task 2a: Promote the model to a deployment space

preview tutorial video To preview this task, watch the video beginning at 00:49.

Follow these steps to promote the model to a new deployment space:

From the Assets tab, click Mortgage Approval Prediction Model to view the model.
On the model page, click the Promote to deployment space icon .
For the Target space, select Create a new deployment space.
1. For the deployment space name, copy and paste the name exactly as shown with no leading or trailing spaces: Golden Bank Preproduction Space
2. Select a storage service from the list.
3. Select your provisioned machine learning service from the list.
4. Click Create.
5. Click Close.
For the Target space, ensure that Golden Bank Preproduction Space is selected.
Check the Go to model in the space after promoting it option.
Click Promote.

Check your progress

The following image shows the model in the deployment space. You are now ready to create a model deployment.

Task 2b: Create an online deployment for the model

preview tutorial video To preview this task, watch the video beginning at 01:30.

Follow these steps to create an online deployment for your model:

When the deployment space opens, click New deployment.
1. For the Deployment type, select Online.
2. For the Name, copy and paste the deployment name exactly as shown with no leading or trailing spaces: Mortgage Approval Model Deployment
3. Click Create.
Wait for the model deployment to complete. When the model is deployed successfully, view the deployment to see the scoring endpoint, and, optionally, test the model.

Check your progress

The following image shows the model deployment. You are now ready to run the notebook to configure the monitors.

Task 3: Run the notebook to set up the monitors

preview tutorial video To preview this task, watch the video beginning at 01:55.

Run the notebook included in the sample project to:

Fetch the model and deployments.
Configure Watson OpenScale.
Create the service provider and subscription for your machine learning service.
Configure the quality monitor.
Configure the fairness monitor.
Configure explainability.

Follow these steps to run the notebook included in the sample project. This notebook sets up monitors for your model, which can also be configured through the user interface. However, it is quicker and less error prone to set them up with a notebook. Take some time to read through the comments in the notebook, which explain the code in each cell.

From the Navigation Menu , choose Projects > View all projects.
Open the Evaluate an ML model project.
Click the Assets tab, and then navigate to Notebooks.
Open the monitor-wml-model-with-watson-openscale notebook.
Since the notebook is in read-only mode, click the Edit icon to place the notebook in edit mode.
When you imported the project from the Resource hub, the first cell of this notebook contains the project access token. If this notebook does not contain a first cell with a project access token, then to generate the token, from the More menu, select Insert project token. This action inserts a new cell as the first cell in the notebook containing the project token.
Under the Provide your IBM Cloud API key section, you need to pass your credentials to the watsonx.ai Runtime API using an API key. If you don't already have a saved API key, then follow these steps to create an API key.
1. Access the IBM Cloud console API keys page.
2. Click Create an IBM Cloud API key. If you have any existing API keys, the button may be labelled Create.
3. Type a name and description.
4. Click Create.
5. Copy the API key.
6. Download the API key for future use.
7. Return to the notebook, and paste your API key in the ibmcloud_api_key field.
In the 3. Model and Deployment section, verify the values assigned to the space_name, model_name, and deployment_name variables.
Click Cell > Run All to run all of the cells in the notebook. Alternatively, click the Run icon next to each cell to run the notebook cell-by-cell to explore each cell and its output.
The notebook takes 1 - 3 minutes to complete. You can monitor the progress cell by cell by noticing the asterisk "In [*]" changing to a number, for example, "In [1]".
If you encounter any errors during the notebook run, try these troubleshooting tips:
- Click Kernel > Restart & Clear Output to restart the kernel, and then run the notebook again.
- Delete any existing Watson OpenScale deployments, and provision a new service instance.
- Verify that you created the deployment space and deployment name by copying and pasting the specified artifact name exactly with no leading or trailing spaces.

Check your progress

The following image shows the notebook when the run is complete. The notebook set up monitors for your model, so you can now view the deployment in Watson OpenScale.

Task 4: Evaluate the model

preview tutorial video To preview this task, watch the video beginning at 03:35.

Follow these steps to download holdout data, and use that data to evaluate the model in Watson OpenScale:

Click the Evaluate an ML model project in the navigation trail.
On the Assets tab, click Data > Data assets.
Click the Overflow menu for the GoldenBank_HoldoutData.csv data asset, and choose Download. To validate that the model is working as required, you need a set of labeled data, which was held out from model training. This CSV file contains that holdout data.
Launch Watson OpenScale.
1. From the Navigation Menu , choose Services > Service instances.
2. Click your Watson OpenScale instance to open the service instance page. If prompted, log in using the same credentials that you used to sign up for Cloud Pak for Data.
  
  Note: Your instance may have a different name, such as `watsonx.governance-xx`.
3. On the Watson OpenScale service instance page, click Launch Watson OpenScale.
On the Insights dashboard, click the Mortgage Approval Model Deployment tile.
From the Actions menu, select Evaluate now.
From the list of import options, select from CSV file.
Drag the Golden Bank_HoldoutData.csv data file you downloaded from the project into the side panel.
Click Upload and evaluate and wait for the evaluation to complete.

Check your progress

The following image shows the result of the evaluation for the deployed model in Watson OpenScale. Now that you evaluated the model, you are ready to observe the model quality.

Task 5: Observe the model monitors for quality

preview tutorial video To preview this task, watch the video beginning at 04:40.

The Watson OpenScale quality monitor generates a set of metrics to evaluate the quality of your model. You can use these quality metrics to determine how well your model predicts outcomes. When the evaluation that uses the holdout data completes, follow these steps to observe the model quality or accuracy:

In the Quality section, click the Configure icon . Here you can see that the quality threshold that is configured for this monitor is 70% and that the measurement of quality being used is area under the ROC curve.
Click Go to model summary to return to the model details screen.
In the Quality section, click the Details icon to see the model quality detailed results. Here you see a number of quality metric calculations and a confusion matrix showing correct model decisions along with false positives and false negatives. The calculated area under the ROC curve is 0.9 or higher, which exceeds the 0.7 threshold, so the model is meeting its quality requirement.
Click Mortgage Approval Model Deployment in the navigation trail to return to the model details screen.

Check your progress

The following image shows the quality details in Watson OpenScale. Now that you observed the model quality, you can observe the model fairness.

Task 6: Observe the model monitors for fairness

preview tutorial video To preview this task, watch the video beginning at 05:41.

The Watson OpenScale fairness monitor generates a set of metrics to evaluate the fairness of your model. You can use the fairness metrics to determine if your model produces biased outcomes. Follow these steps to observe the model fairness:

In the Fairness section, click the Configure icon . Here you see that the model is being reviewed to ensure that applicants are being treated fairly regardless of their gender. Women are identified as the monitored group for whom fairness is being measured and the threshold for fairness is to be at least 80%. The fairness monitor uses the disparate impact method to determine fairness. Disparate impact compares the percentage of favorable outcomes for a monitored group to the percentage of favorable outcomes for a reference group.
Click Go to model summary to return to the model details screen.
In the Fairness section, click the Details icon to see the model fairness detailed results. Here you see the percentage of male and female applicants who are being automatically approved, along with a fairness score of over 100%, so the model performance far exceeds the 80% fairness threshold required.
Note the identified data sets. To ensure that the fairness metrics are most accurate, Watson OpenScale uses perturbation to determine the results where only the protected attributes and related model inputs are changed while other features remain the same. The perturbation changes the values of the feature from the reference group to the monitored group, or vice-versa. These additional guardrails are used to calculate fairness when the "balanced" data set is used, but you can also view the fairness results using only payload or model training data. Since the model is behaving fairly, you don't need to go into additional detail for this metric.
Click the Mortgage Approval Model Deployment navigation trail to return to the model details screen.

Check your progress

The following image shows the fairness details in Watson OpenScale. Now that you observed the model fairness, you can observe the model explainability.

Task 7: Observe the model monitors for drift

preview tutorial video To preview this task, watch the video beginning at 07:25.

The Watson OpenScale drift monitor measures changes in your data over time to ensure consistent outcomes for your model. Use drift evaluations to identify changes in your model output, the accuracy of your predictions, and the distribution of your input data. Follow these steps to observe the model drift:

In the Drift section, click the Configure icon . Here you see the drift thresholds. Output drift measures the change in the model confidence distribution. Model quality drift measures the drop in accuracy by comparing the estimated runtime accuracy to the training accuracy. Feature drift measures the change in value distribution for important features. The configuration also shows the number of selected features and the most important features.
Click Go to model summary to return to the model details screen.
In the Drift section, click the Details icon to see the model drift detailed results. You can view the history of how each metric score changes over time with a time series chart. Lower values are better, so in this case, the results are above the upper thresholds that are set in the configuration. Then view details about how the scores output and feature drifts are calculated. You can also view details about each feature to understand how they contribute to the scores that Watson OpenScale generates.
Click the Mortgage Approval Model Deployment navigation trail to return to the model details screen.

Check your progress

The following image shows the drift details in Watson OpenScale. Now that you observed the model drift, you can observe the model explainability.

Task 8: Observe the model monitors for explainability

preview tutorial video To preview this task, watch the video beginning at 08:46.

It is also important to understand how the model came to its decision. This understanding is required both to explain decisions to people involved in the loan approval and to ensure model owners that the decisions are valid. To understand these decisions, follow these steps to observe the model explainability:

In the left navigation panel, click the Explain a transaction icon .
Select Mortgage Approval Model Deployment to see a list of transactions.
For any transaction, click Explain under the Actions column. Here you see the detailed explanation of this decision. You will see the most important inputs to the model along with how important each was to the end result. Blue bars represent inputs that tended to support the model's decision while red bars show inputs that might have led to another decision. For example, an applicant might have enough income to otherwise be approved but their poor credit history and high debt together lead the model to reject the application. Review this explanation to become satisfied about the basis for the model decision.
(Optional) If you want to delve further into how the model made its decision, click the Inspect tab. Use the Inspect feature to analyze the decision to find areas of sensitivity where a small changes to a few inputs would result in a different decision, and you can test the sensitivity yourself by overriding some of the actual inputs with alternatives to see whether these would impact the result.

Check your progress

The following image shows the explainability of a transaction in Watson OpenScale. You have determined that the model is accurate and treating all applicants fairly. Now, you can advance the model to the next phase in its lifecycle.

Next steps

Try these additional tutorials to get more hands-on experience with building and evaluating models:

Additional resources

View more videos.
Overview of Cloud Pak for Data as a Service

Parent topic: Quick start tutorials