0 / 0
AI governance tutorial: Test and validate the model
Last updated: Nov 27, 2024
AI governance tutorial: Test and validate the model

Take this tutorial to evaluate and monitor the model built and deployed in the Build and deploy a model tutorial with the AI governance use case. Your goal is to use Watson OpenScale to configure and evaluate monitors for a deployed model that predicts which applicants qualify for mortgages. You want to ensure that the model is accurate and treating all applicants fairly.

The story for the tutorial is that Golden Bank wants to expand its business by offering low-rate mortgage renewals for online applications. Online applications expand the bank’s customer reach and reduce the bank’s application processing costs. As a data scientist at Golden Bank, you must create a mortgage approval model that avoids unanticipated risk and treats all applicants fairly. You will run a Jupyter notebook to set up monitors for a machine learning model so that you can deploy them into productive use with confidence that they operate effectively and as intended. This task is accomplished through Cloud Pak for Data services, which together deliver trust in your data, trust in your models, and trust in your processes that are required to operate AI with certainty.

The following animated image provides a quick preview of what you’ll accomplish by the end of the tutorial. Click the image to view a larger image.

Animated image

Preview the tutorial

In this tutorial, you will complete these tasks:

Watch Video Watch this video to preview the steps in this tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.





Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.

Use the video picture-in-picture

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

The following animated image shows how to use the video picture-in-picture and table of contents features:

How to use picture-in-picture and chapters

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.



Set up the prerequisites

Complete the Build and deploy a model tutorial

preview tutorial video To preview this task, watch the video beginning at 00:47.

Complete the Build and deploy a model tutorial to create, promote, and deploy the machine learning model that is used in this tutorial.

Provision the services

preview tutorial video To preview this task, watch the video beginning at 01:14.

Important: Watson OpenScale is available in the Dallas and Frankfurt regions only. After completing the Build and deploy a model tutorial, you should be using the Dallas region. If necessary, switch to the Dallas region before continuing.

In addition to the services required to complete the Build and deploy a model tutorial, you also need the Watson OpenScale service provisioned. Follow these steps to verify or provision the necessary services:

  1. In Cloud Pak for Data, verify that you are in the Dallas region. If not, click the region drop down, and then select Dallas.
    Change region

  2. From the Navigation menu Navigation menu, choose Services > Service instances.

  3. View the list of services to determine whether a watsonx.governance or Watson OpenScale service instance exists.

  4. If you need to create a watsonx.governance service instance, click Add service.

  5. Select watsonx.governance. Note that watsonx.governance includes Watson OpenScale.

    1. For the region, select Dallas.

    2. Select the Lite plan.

    3. Click Create.

  6. The following additional services were required for the Build and deploy a model tutorial:

    • watsonx.ai Studio
    • watsonx.ai Runtime
    • IBM Knowledge Catalog
    • Cloud Object Storage

Checkpoint icon Check your progress

The following image shows the provisioned service instances. You are now ready to start this tutorial.

Provisioned services




Task 1: Run the notebook to set up the monitors

preview tutorial video To preview this task, watch the video beginning at 01:55.


Run the second notebook included in the sample project to:

  • Fetch the model and deployments.
  • Configure Watson OpenScale.
  • Create the service provider and subscription for your machine learning service.
  • Configure the quality monitor.
  • Configure the fairness monitor.
  • Configure explainability.

Follow these steps to run the notebook included in the sample project. This notebook sets up monitors for your model, which can also be configured through the user interface. However, it is quicker and less error prone to set them up with a notebook. Take some time to read through the comments in the notebook, which explain the code in each cell.

  1. From the Navigation menu Navigation menu, choose Projects > View all projects.

  2. Open the AI governance project.

    Note: You might see a guided tour showing the tutorials that are included with this use case. The links in the guided tour will open these tutorial instructions.
  3. Click the Assets tab, and then navigate to Notebooks.
    Left navigation

  4. Open the 2-monitor-wml-model-with-watson-openscale notebook.

  5. Since the notebook is in read-only mode, click the Edit icon Edit to place the notebook in edit mode.

  6. When you imported the project from the Resource hub, the first cell of this notebook contains the project access token. If this notebook does not contain a first cell with a project access token, then to generate the token, from the More menu, select Insert project token. This action inserts a new cell as the first cell in the notebook containing the project token.

  7. Under the Insert IBM Cloud API key section, paste your API key in the ibmcloud_api_key field.

  8. Click Cell > Run All to run all of the cells in the notebook. Alternatively, you can run the notebook cell by cell if you want to explore each cell and its output.

  9. The notebook takes 1 - 3 minutes to complete. You can monitor the progress cell by cell noticing the asterisk "In [*]" changing to a number, for example, "In [1]".

  10. If you encounter any errors during the notebook run, try these troubleshooting tips:

    • Click Kernel > Restart & Clear Output to restart the kernel, and then run the notebook again.

    • Delete any existing Watson OpenScale deployments, and provision a new service instance.

    • Verify that you created the AI use case, deployment space, and deployment name in the Build and deploy a model tutorial by copying and pasting the specified artifact name exactly with no leading or trailing spaces.

Checkpoint icon Check your progress

The following image shows the notebook when the run is complete. The notebook set up monitors for your model, so you can now view the deployment in Watson OpenScale.

Completed notebook run




Task 2: Evaluate the model

preview tutorial video To preview this task, watch the video beginning at 03:40.

Follow these steps to download holdout data, and use that data to evaluate the model in Watson OpenScale:

  1. Click the AI governance project in the navigation trail.
    Navigation trail

  2. On the Assets tab, click Data > Data assets.

  3. Click the Overflow menu Overflow menu for the GoldenBank_HoldoutData.csv data asset, and choose Download. To validate that the model is working as required, you need a set of labeled data, which was held out from model training. This CSV file contains that holdout data.

  4. Launch Watson OpenScale. From the Navigation Menu Navigation menu, choose Services > Service instances.

  5. Open your Watson OpenScale instance. If prompted, log in using the same credentials that you used to sign up for Cloud Pak for Data.

  6. On the Watson OpenScale service instance page, click Launch Watson OpenScale.

  7. On the Insights dashboard, click the Mortgage Approval Model Deployment tile.

  8. From the Actions menu, select Evaluate now.

  9. From the list of import options, select from CSV file.

  10. Drag the Golden Bank_HoldoutData.csv data file you downloaded from the project into the side panel.

  11. Click Upload and evaluate.

Checkpoint icon Check your progress

The following image shows the result of the evaluation for the deployed model in Watson OpenScale. Now that you evaluated the model, you are ready to observe the model quality.

Evaluated model




Task 3: Observe the model monitors for quality

preview tutorial video To preview this task, watch the video beginning at 04:44.

The Watson OpenScale quality monitor generates a set of metrics to evaluate the quality of your model. You can use these quality metrics to determine how well your model predicts outcomes. When the evaluation that uses the holdout data completes, follow these steps to observe the model quality or accuracy:

  1. In the left navigation panel, click the Insights dashboard icon Insights dashboard.

  2. Locate the Mortgage Approval Model Deployment tile. Notice that the deployment has 0 issues, and that both Quality and Fairness tests passed, meaning that the model met the thresholds that are required of it.

  3. Click the Mortgage Approval Model Deployment tile to see more detail.

  4. In the Quality section, click the Configure icon Configure. Here you can see that the quality threshold that is configured for this monitor is 70% and that the measurement of quality being used is area under the ROC curve.

  5. Click Go to model summary to return to the model details screen.

  6. In the Quality section, click the Details iconDetails to see the model quality detailed results. Here you see a number of quality metric calculations and a confusion matrix showing correct model decisions along with false positives and false negatives. The calculated area under the ROC curve is 0.9 or higher, which exceeds the 0.7 threshold, so the model is meeting its quality requirement.

  7. Click Mortgage Approval Model Deployment in the navigation trail to return to the model details screen.

Checkpoint icon Check your progress

The following image shows the quality details in Watson OpenScale. Now that you observed the model quality, you can observe the model fairness.

Quality




Task 4: Observe the model monitors for fairness

preview tutorial video To preview this task, watch the video beginning at 06:01.

The Watson OpenScale fairness monitor generates a set of metrics to evaluate the fairness of your model. You can use the fairness metrics to determine if your model produces biased outcomes. Follow these steps to observe the model fairness:

  1. In the Fairness section, click the Configure icon Configure. Here you see that the model is being reviewed to ensure that applicants are being treated fairly regardless of their gender. Women are identified as the monitored group for whom fairness is being measured and the threshold for fairness is to be at least 80%. The fairness monitor uses the disparate impact method to determine fairness. Disparate impact compares the percentage of favorable outcomes for a monitored group to the percentage of favorable outcomes for a reference group.

  2. Click Go to model summary to return to the model details screen.

  3. In the Fairness section, click the Details icon Details to see the model fairness detailed results. Here you see the percentage of male and female applicants who are being automatically approved, along with a fairness score of over 100%, so the model performance far exceeds the 80% fairness threshold required.

  4. Note the identified data sets. To ensure that the fairness metrics are most accurate, Watson OpenScale uses perturbation to determine the results where only the protected attributes and related model inputs are changed while other features remain the same. The perturbation changes the values of the feature from the reference group to the monitored group, or vice-versa. These additional guardrails are used to calculate fairness when the "balanced" data set is used, but you can also view the fairness results using only payload or model training data. Since the model is behaving fairly, you don't need to go into additional detail for this metric.

    Fairness data sets

  5. Click the Mortgage Approval Model Deployment navigation trail to return to the model details screen.

Checkpoint icon Check your progress

The following image shows the fairness details in Watson OpenScale. Now that you observed the model fairness, you can observe the model explainability.

Fairness




Task 5: Observe the model monitors for explainability

preview tutorial video To preview this task, watch the video beginning at 07:42.

It is also important to understand how the model came to its decision. This understanding is required both to explain decisions to people involved in the loan approval and to ensure model owners that the decisions are valid. To understand these decisions, follow these steps to observe the model explainability:

  1. In the left navigation panel, click the Explain a transaction icon Explain a transaction.

  2. Select Mortgage Approval Model Deployment to see a list of transactions.

  3. For any transaction, click Explain under the Actions column. Here you see the detailed explanation of this decision. You will see the most important inputs to the model along with how important each was to the end result. Blue bars represent inputs that tended to support the model's decision while red bars show inputs that might have led to another decision. For example, an applicant might have enough income to otherwise be approved but their poor credit history and high debt together lead the model to reject the application. Review this explanation to become satisfied about the basis for the model decision.

  4. (Optional) If you want to delve further into how the model made its decision, click the Inspect tab. Use the Inspect feature to analyze the decision to find areas of sensitivity where a small changes to a few inputs would result in a different decision, and you can test the sensitivity yourself by overriding some of the actual inputs with alternatives to see whether these would impact the result.

Checkpoint icon Check your progress

The following image shows the explainability of a transaction in Watson OpenScale. You have determined that the model is accurate and treating all applicants fairly. Now, you can advance the model to the next phase in its lifecycle.

Explainability




Task 6: Promote the model to pre-production and approve the model

preview tutorial video To preview this task, watch the video beginning at 09:21.

Follow these steps to change the status of the AI use case in the model inventory and approve the model:

  1. Return to Cloud Pak for Data, and from the Navigation Menu Navigation menu, choose Catalogs > AI use cases.

  2. Open the Mortgage Approval Model Use Case.

  3. Click the Lifecycle tab to see that the model is now in the Validate stage.

  4. Click the Edit icon Edit next to Status, select Validation complete, and click Update.

  5. Return to the Watson OpenScale Insights dashboard.

  6. Click the Mortgage Approval Model Deployment tile.

  7. From the Actions menu, select Approve for production, and then click Approve. This action conveys to the AI operations team that they can now deploy the model to a designated production space.

  8. From the Navigation menu Navigation menu, choose Catalogs > AI use cases.

  9. For the Mortgage Approval model use case, click View details.

  10. Click the Lifecycle tab. Under Model tracking, the AI use case now displays as evaluated and approved in the Validate stage.

  11. View the Mortgage Approval Model Deployment to see the factsheet captured by Watson OpenScale.

  12. Close the model deployment factsheet.

Checkpoint icon Check your progress

The following image shows the AI use case with the model in the Validate phase. Your model is now in production.

AI use case in Validate phase

Share the model

preview tutorial video To preview this task, watch the video beginning at 10:39.

You can generate a report from a factsheet or AI use case in PDF, HTML, and DOCX format so you can share or print the details about a model being tracked.

Note:

If you don't have a Platform assets catalog, then see Creating the Platform assets catalog.

  1. From the Asset tab in the AI use case, click Export report.

  2. For the Format options, choose a format.

  3. For the Report template, select a template:

    • Full report: contains all data from the Basic report and details about the models and deployments in the AI use case.

    • Basic report: contains the set of facts visible on the Overview and Assets tabs.

  4. Click Export. The report displays in a new window. If the report doesn't display, then access your browser's downloads, and view the PDF file from there.

Checkpoint icon Check your progress

The following image shows the full report for the Mortgage Approval Model Use Case. You can now share this report with your colleagues.

AI use case report



As a data scientist at Golden Bank, you created a mortgage approval model that avoids unanticipated risk and treats all applicants fairly. You ran a Jupyter notebook to set up monitors for your machine learning model which you deployed into productive use with confidence that the model operates effectively and as intended.


Cleanup (Optional)

If you would like to retake the tutorials in the AI governance use case, delete the following artifacts.

Artifact How to delete
Mortgage Approval Model Deployment in the Golden Bank Preproduction Space Delete a deployment
Golden Bank Preproduction Space Delete a deployment space
Mortgage Approval Model Use Case Delete a model use case
Mortgage Approval Catalog Delete a catalog
AI governance sample project Delete a project

Next steps

Learn more

Watch Video Watch how to use IBM OpenPages to manage the model through its lifecycle operation.

This video provides a visual method to learn the concepts and tasks in this documentation.

Parent topic: Use case tutorials

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more