Data Science and MLOps tutorial: Orchestrate an AI pipeline with model monitoring
Take this tutorial to create an end-to-end pipeline to deliver concise, pre-processed, and up-to-date data stored in an external data source for the Data Science and MLOps use case. Your goal is to use Orchestration Pipelines to orchestrate that end-to-end workflow to generate automated, consistent, and repeatable outcomes. The pipeline uses Data Refinery and AutoAI, which automates several aspects for a model building process such as, feature engineering and hyperparameter optimization. AutoAI ranks candidate algorithms, and then selects the best model.
The story for the tutorial is that Golden Bank wants to expand its business by offering special low-rate mortgage renewals for online applications. Online applications expand the bank’s customer reach and reduce the bank’s application processing costs. To help lenders with decision making, the team will use Orchestration Pipelines to create a data pipeline that delivers up-to-date data on all mortgage applicants. The data is stored in Db2 Warehouse. You need to prepare the data because it is potentially incomplete, outdated, and might be obfuscated or entirely inaccessible due to data privacy and sovereignty policies. Then, the team builds a mortgage approval model from trusted data, and deploys and tests the model in a pre-production environment. Finally, the team uses a notebook to configure Watson OpenScale monitors, and then evaluates and observes the monitors in Watson OpenScale to make sure that the model was treating all applicants fairly.
The following animated image provides a quick preview of what you will accomplish by the end of this tutorial. You will edit and run a pipeline to build and deploy a machine learning model, run a notebook to configure monitors, and validate the model. Click the image to view a larger image.
Preview the tutorial
In this tutorial, you will complete these tasks:
- Set up the prerequisites.
- Task 1: View the assets in the sample project.
- Task 2: Explore an existing pipeline.
- Task 3: Add a node to the pipeline.
- Task 4: Run the pipeline.
- Task 5: View the assets, deployed model, and online deployment.
If you want to continue to see how to monitor the model using Watson OpenScale, then complete tasks 6-10:
- Task 6: Run the notebook to configure the Watson OpenScale monitors.
- Task 7: Evaluate the model.
- Task 8: Observe the model monitors for quality.
- Task 9: Observe the model monitors for fairness.
- Task 10: Observe the model monitors for explainability.
- Cleanup (Optional)
Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.
This video provides a visual method to learn the concepts and tasks in this documentation.
Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.
Use the video picture-in-picture
The following animated image shows how to use the video picture-in-picture and table of contents features:
Get help in the community
If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.
Set up your browser windows
For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.
Set up the prerequisites
Sign up for Cloud Pak for Data as a Service
You must sign up for Cloud Pak for Data as a Service and provision the necessary services for the Data integration use case.
- If you have an existing Cloud Pak for Data as a Service account, then you can get started with this tutorial. If you have a Lite plan account, only one user per account can run this tutorial.
- If you don't have a Cloud Pak for Data as a Service account yet, then sign up.
Verify the necessary provisioned services
To preview this task, watch the video beginning at 00:50.
Follow these steps to verify or provision the necessary services:
-
From the Cloud Pak for Data navigation menu , choose Services > Service instances.
-
Use the Product drop-down list to determine whether an existing Watson Studio service instance exists.
-
If you need to create a Watson Studio service instance, click Add service.
-
Select Watson Studio.
-
Select the Lite plan.
-
Click Create.
-
-
Wait while the Watson Studio service is provisioned, which might take a few minutes to complete.
-
Repeat these steps to verify or provision the following additional services:
- Watson Machine Learning
- Cloud Object Storage
- watsonx.governance - if you want to monitor the deployed model
Check your progress
The following image shows the provisioned service instances:
Create the sample project
To preview this task, watch the video beginning at 01:27.
If you already have the sample project for this tutorial, then skip this task. Otherwise, follow these steps:
-
Access the Data Science and MLOps sample project in the Resource hub.
-
Click Create project.
-
If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.
-
Click Create.
-
Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.
-
Click the Assets tab to see the assets for this tutorial.
Check your progress
The following image shows the Assets tab in the sample project. You are now ready to start the tutorial.
Associate the Watson Machine Learning service with the sample project
To preview this task, watch the video beginning at 02:17.
You will use Watson Machine Learning to create and deploy the model, so follow these steps to associate your Watson Machine Learning service instance with the sample project.
-
In the Data Science and MLOps project, click the Manage tab.
-
Click the Services & Integrations page.
-
Click Associate service.
-
Check the box next to your Watson Machine Learning service instance.
-
Click Associate.
-
Click Cancel to return to the Services & Integrations page.
Check your progress
The following image shows the Services and Integrations page with the Watson Machine Learning service listed. You are now ready to create the sample project.
Task 1: View the assets in the sample project
To preview this task, watch the video beginning at 02:37.
The sample project includes several assets including a connection, data definition, one Data Refinery flow, and a pipeline. Follow these steps to view those assets:
-
Click the Assets tab in the Data Science and MLOps project, and then view All assets.
-
View the list of data assets that are used in the Data Refinery flow and the pipeline. These assets are stored in a Data Fabric Trial - Db2 Warehouse connection in the AI_MORTGAGE schema. Click Import assets, and then navigate to the Data Fabric Trial - Db2 Warehouse > AI_MORTGAGE. The following image shows the assets from that connection:
-
The Mortgage_Data_Approvals_flow Data Refinery flow integrates data about each mortgage applicant. The integrated data includes personally identifiable information, with their application details, credit scores, status as a commercial buyer, and finally the prices of each applicant’s chosen home. The flow then creates a sequential file with the name
Mortgage_Data_with_Approvals_DS.csv
in the project containing the joined data. The following image shows the Mortgage_Data_Approvals_flow Data Refinery flow:
Check your progress
The following image shows all of the assets in the sample project. You are now ready to explore the pipeline in the sample project.
Task 2: Explore an existing pipeline
To preview this task, watch the video beginning at 03:25.
The sample project includes an Orchestration Pipelines, which automates the following tasks:
-
Run an existing Data Refinery job.
-
Create an AutoAI experiment.
-
Run the AutoAI experiment and save the best performing model that uses the resulting output file from the Data Refinery job as the training data.
-
Create a deployment space.
-
Promote the saved model to the deployment space.
Follow these steps to explore the pipeline:
-
From the Assets tab in the Data Science and MLOps project, view All assets.
-
Click Mortgage approval pipeline - Data Science to open the pipeline.
-
Double-click the Integrate mortgage approval data Data Refinery job, which combines various tables from the Db2 Warehouse on Cloud connection into a cohesive labeled data set that is used as the training data for the AutoAI experiment. Click Cancel to return to the pipeline.
-
Click the Check status condition, and choose Edit. This condition is a decision point in the pipeline to confirm the completion of the Data Refinery job with a value of either Completed or Completed With Warnings. Click Cancel to return to the pipeline.
-
Double-click the Create AutoAI experiment node to see the settings. This node creates an AutoAI experiment with the settings.
-
Review the values for the following settings:
-
AutoAI experiment name
-
Scope
-
Prediction type
-
Prediction column
-
Positive class
-
Training data split ratio
-
Algorithms to include
-
Algorithms to use
-
Optimize metric
-
-
Click Cancel to close the settings.
-
-
Double-click the Run AutoAI experiment node to see the settings. This node runs the AutoAI experiment that is created by the Create AutoAI experiment node that uses the output from the Integrate Mortgage Approval Data Refinery job as the training data.
-
Review the values for the following settings:
-
AutoAI experiment
-
Training Data Assets
-
Model name prefix
-
-
Click Cancel to close the settings.
-
-
Between the Run AutoAI experiment and Create Deployment Space nodes, click the Do you want to deploy model? condition, and choose Edit. The value of True for this condition is a decision point in the pipeline to continue to create the deployment space. Click Cancel to return to the pipeline.
-
Double-click the Create Deployment Space node to update the settings. This node creates a new deployment space with the specified name, and requires input for your Cloud Object Storage and Watson Machine Learning services.
-
Review the value for the New space name setting.
-
For the New space COS Instance CRN field, select your Cloud Object Storage instance from the list.
-
For the New space WML Instance CRN field, select your Watson Machine Learning instance from the list.
-
Click Save.
-
-
Double-click the Promote Model to Deployment Space node to see the settings. This node promotes the best model from the Run AutoAI experiment node to the deployment space created from the Create Deployment Space node.
-
Review the values for the following settings:
-
Source Assets
-
Target
-
-
Click Cancel to close the settings.
-
Check your progress
The following image shows the initial pipeline. You are now ready to edit the pipeline to add a node.
Task 3: Add a node to the pipeline
To preview this task, watch the video beginning at 05:41.
The pipeline creates the model, creates a deployment space, and then promotes it to a deployment space. You need to add a node to create an online deployment. Follow these steps to edit the pipeline to automate creating an online deployment:
-
Add the Create Online Deployment node to the canvas:
-
Expand the Create section in the node palette.
-
Drag the Create online deployment node onto the canvas, and drop the node after the Promote Model to Deployment Space node.
-
-
Hover over the Promote Model to Deployment Space node to see the arrow. Connect the arrow to the Create online deployment node.
Note: The node names in your pipeline might differ from the following animated image. -
Connect the Create online deployment for promoted model comment to the Create online deployment node by connecting the circle on the comment box to the node.
Note: The node names in your pipeline might differ from the following animated image. -
Double-click the Create online deployment node to see the settings.
-
Change the node name to
Create Online Deployment
. -
Next to ML asset, click Select from another node from the menu.
-
Select the Promote Model to Deployment Space node from the list. The node ID winning_model is selected.
-
For the New deployment name, type
Mortgage approval model deployment - Data Science
. -
For Creation Mode, select Overwrite.
-
Click Save to save the Create Online Deployment node settings.
Check your progress
The following image shows the completed pipeline. You are now ready to run the pipeline.
Task 4: Run the pipeline
To preview this task, watch the video beginning at 06:57.
Now that the pipeline is complete, follow these steps to run the pipeline:
-
From the toolbar, click Run pipeline > Trial run.
-
On the Define pipeline parameters page, select True for the deployment.
-
If set to True, then the pipeline verifies the deployed model and scores the model.
-
If set to False, then the pipeline verifies that the model was created in the project by the AutoAI experiment, and reviews the model information and training metrics.
-
-
Provide an API key if this occasion is your first time running a pipeline. Pipeline assets use your personal IBM Cloud API key to run operations securely without disruption.
-
If you have an existing API key, click Use existing API key, paste the API key, and click Save.
-
If you don't have an existing API key, click Generate new API key, provide a name, and click Save. Copy the API key, and then save the API key for future use. When you're done, click Close.
-
-
Click Run to start running the pipeline.
-
Monitor the pipeline progress.
-
Scroll through consolidated logs while the pipeline is running. The trial run might take up to 10 minutes to complete.
-
As each operation completes, select the node for that operation on the canvas.
-
On the Node Inspector tab, view the details of the operation.
-
Click the Node output tab to see a summary of the output for each node operation.
-
Check your progress
The following image shows the pipeline after it completed the trial run. You are now ready to review the assets that the pipeline created.
Task 5: View the assets, deployed model, and online deployment
To preview this task, watch the video beginning at 08:58.
The pipeline created several assets. Follow these steps to view the assets:
-
Click the Data Science and MLOps project name in the navigation trail to return to the project.
-
On the Assets tab, view All assets.
-
View the data assets.
-
Click the Mortgage_Data_with_Approvals_DS.csv data asset. The Data Refinery job created this asset.
-
Click the Data Science and MLOps project name in the navigation trail to return to the Assets tab.
-
-
View the model.
-
Click the machine learning model asset beginning with ds_mortgage_approval_best_model. The AutoAI experiment generated several model candidates, and chose this as the best model. Save this model name to a text file. The model name is required to configure the Watson OpenScale monitors in the next task.
-
Scroll through the model information.
-
Click the Data Science and MLOps project name in the navigation trail to return to the Assets tab.
-
-
Click the Jobs tab in the project to see information about the Data Refinery and Pipeline jobs.
-
Open the deployment space that you created with the pipeline.
-
From the Cloud Pak for Data navigation menu , choose Deployments.
-
Click the Spaces tab.
-
Click the Mortgage approval - Data Science and MLOps deployment space.
-
-
Click the Assets tab, and see the deployed model beginning with ds_mortgage_approval_best_model.
-
Click the Deployments tab.
-
Click Mortgage approval model deployment - Data Science to view the deployment.
-
On the API reference tab, view API endpoint and code snippets.
-
Click the Test tab.
-
Click the JSON input tab, and replace the sample text with the following JSON text.
{ "input_data": [ { "fields": [ "ID", "NAME", "STREET_ADDRESS", "CITY", "STATE", "STATE_CODE", "ZIP_CODE", "EMAIL_ADDRESS", "PHONE_NUMBER", "GENDER", "SOCIAL_SECURITY_NUMBER", "EDUCATION", "EMPLOYMENT_STATUS", "MARITAL_STATUS", "INCOME", "APPLIEDONLINE", "RESIDENCE", "YRS_AT_CURRENT_ADDRESS", "YRS_WITH_CURRENT_EMPLOYER", "NUMBER_OF_CARDS", "CREDITCARD_DEBT", "LOANS", "LOAN_AMOUNT", "CREDIT_SCORE", "CRM_ID", "COMMERCIAL_CLIENT", "COMM_FRAUD_INV", "FORM_ID", "PROPERTY_CITY", "PROPERTY_STATE", "PROPERTY_VALUE", "AVG_PRICE" ], "values": [ [ null, null, null, null, null, null, null, null, null, null, null, "Bachelor", "Employed", null, 144306, null, "Owner Occupier", 15, 19, 2, 7995, 1, 1483220, 437, null, false, false, null, null, null, 111563, null ], [ null, null, null, null, null, null, null, null, null, null, null, "High School", "Employed", null, 45283, null, "Private Renting", 11, 13, 1, 1232, 1, 7638, 706, null, false, false, null, null, null, 54262, null ] ] } ] }
-
Click Predict. The results show that the first applicant will not be approved and the second applicant will be approved.
-
Check your progress
The following image shows the results of the test. The confidence scores for your test might be different from the scores shown in the image.
Task 6: Run the notebook to configure the Watson OpenScale monitors
To preview this task, watch the video beginning at 10:40.
Now you are ready to run the notebook included in the sample project. The notebook includes the code to:
- Fetch the model and deployments.
- Configure Watson OpenScale.
- Create the service provider and subscription for your machine learning service.
- Configure the quality monitor.
- Configure the fairness monitor.
- Configure explainability.
Follow these steps to run the notebook included in the sample project. Take some time to read through the comments in the notebook, which explain the code in each cell.
-
From the Cloud Pak for Data navigation menu , choose Projects > View all projects.
-
Click the Data Science and MLOps project name.
-
Click the Assets tab, and then navigate to Notebooks.
-
Open the monitor-wml-model-with-watson-openscale-pipeline notebook.
-
Click the Edit icon to place the notebook in edit mode.
-
When you import a project from the Resource hub, the first cell of the notebook contains the project access token. If this notebook does not contain a first cell with a project access token, you need to generate the token. From the More menu, select Insert project token. This action inserts a new cell as the first cell in the notebook containing the project token.
-
Provide you API key in the Provide your IBM Cloud API key section. You need to pass your credentials to the Watson Machine Learning API using an API key. If you don't already have a saved API key, then follow these steps to create an API key.
To preview this task, watch the video beginning at 04:55.-
Access the IBM Cloud console API keys page.
-
Click Create an IBM Cloud API key. If you have any existing API keys, the button may be labelled Create.
-
Type a name and description.
-
Click Create.
-
Copy the API key.
-
Download the API key for future use.
-
Return to the notebook, and paste your API key in the ibmcloud_api_key field.
-
-
In section 3. Model and Deployment, for the model_name variable, paste the model name that you saved to a text file in the previous task. The space_name and deployment_name are filled in for you using the names specified in the pipeline.
-
Click Cell > Run All to run all of the cells in the notebook. Alternatively, click the Run icon to run the notebook cell by cell to explore each cell and its output.
-
Monitor the progress cell by cell, noticing the asterisk "In [
*
]" changing to a number, for example, "In [1
]". The notebook takes 1 - 3 minutes to complete. -
Try these tips if you encounter any errors while running the notebook:
- Click Kernel > Restart & Clear Output to restart the kernel, and then run the notebook again.
- Verify that you copied and pasted the deployment name exactly with no leading or trailing spaces.
Check your progress
The following image shows the notebook when the run is complete. The notebook saved the model in the project, so you are now ready to evaluate the model.
Task 7: Evaluate the model
To preview this task, watch the video beginning at 13:35.
Follow these steps to evaluate the model in Watson OpenScale:
-
Click the Data Science and MLOps project in the navigation trail.
-
On the Assets tab, expand the Data asset type, and then click Data assets.
-
Click the Overflow menu for the mortgage_sample_test_data.csv data asset, and choose Download. To validate that the model is working as required, you need a set of labeled data, which was held out from model training. This CSV file contains that holdout data.
-
Launch Watson OpenScale.
-
From the Cloud Pak for Data navigation menu , choose Services > Service instances.
-
Click your Watson OpenScale instance name. If prompted, log in using the same credentials that you used to sign up for Cloud Pak for Data.
-
On the Watson OpenScale service instance page, click Launch Application.
-
-
On the Insights dashboard, click the Mortgage approval model deployment - Data Science tile.
-
From the Actions menu, select Evaluate now.
-
From the list of import options, select from CSV file.
-
Drag the mortgage_sample_test_data.csv data file you downloaded from the project into the side panel.
-
Click Upload and evaluate. The evaluation might take several minutes to complete.
Check your progress
The following image shows the result of the evaluation for the deployed model in Watson OpenScale. Now that you evaluated the model, you are ready to observe the model quality.
Task 8: Observe the model monitors for quality
To preview this task, watch the video beginning at 14:44.
The Watson OpenScale quality monitor generates a set of metrics to evaluate the quality of your model. You can use these quality metrics to determine how well your model predicts outcomes. When the evaluation that uses the holdout data completes, follow these steps to observe the model quality or accuracy:
-
In the left navigation panel, click the Insights dashboard icon .
-
Locate the Mortgage approval model deployment - Data Science tile. Notice that the deployment has 0 issues, and that both Quality and Fairness tests did not generate any errors, meaning that the model met the thresholds that are required of it.
Note: You might need to refresh the dashboard to see the updates after evaluation. -
Click the Mortgage approval model deployment - Data Science tile to see more detail.
-
In the Quality section, click the Configure icon . Here you can see that the quality threshold that is configured for this monitor is 70% and that the measurement of quality being used is area under the ROC curve.
-
Click Go to model summary to return to the model details screen.
-
In the Quality section, click the Details icon to see the model quality detailed results. Here you see a number of quality metric calculations and a confusion matrix showing correct model decisions along with false positives and false negatives. The calculated area under the ROC curve is 0.9 or higher, which exceeds the 0.7 threshold, so the model is meeting its quality requirement.
-
Click Mortgage approval model deployment - Data Science in the navigation trail to return to the model details screen.
Check your progress
The following image shows the quality details in Watson OpenScale. Quality scores may vary. Now that you observed the model quality, you can observe the model fairness.
Task 9: Observe the model monitors for fairness
To preview this task, watch the video beginning at 15:59.
The Watson OpenScale fairness monitor generates a set of metrics to evaluate the fairness of your model. You can use the fairness metrics to determine if your model produces biased outcomes. Follow these steps to observe the model fairness:
-
In the Fairness section, click the Configure icon . Here you see that the model is being reviewed to ensure that applicants are being treated fairly regardless of their gender. Women are identified as the monitored group for whom fairness is being measured, and the threshold for fairness is to be at least 80%. The fairness monitor uses the disparate impact method to determine fairness. Disparate impact compares the percentage of favorable outcomes for a monitored group to the percentage of favorable outcomes for a reference group.
-
Click Go to model summary To return to the model details screen,
-
In the Fairness section, click the Details icon to see the model fairness detailed results. Here you see the percentage of male and female applicants who are being automatically approved, along with a fairness score of about 100%, so the model performance far exceeds the 80% fairness threshold required.
-
Notice the identified data sets in the Data set list. To ensure that the fairness metrics are most accurate, Watson OpenScale uses perturbation to determine the results where only the protected attributes and related model inputs are changed while other features remain the same. The perturbation changes the values of the feature from the reference group to the monitored group, or vice-versa. These additional guardrails are used to calculate fairness when the "balanced" data set is used, but you can also view the fairness results by using only payload or model training data. Because the model is behaving fairly, you don't need to go into additional detail for this metric.
-
Click the Mortgage approval model deployment - Data Science navigation trail to return to the model details screen.
Check your progress
The following image shows the fairness details in Watson OpenScale. Now that you observed the model fairness, you can observe the model explainability.
Task 10: Observe the model monitors for explainability
To preview this task, watch the video beginning at 17:42.
You need to understand how the model came to its decision. This understanding is required both to explain decisions to people involved in the loan approval and to ensure model owners that the decisions are valid. To understand these decisions, follow these steps to observe the model explainability:
-
In the left navigation panel, click the Explain a transaction icon .
-
Select Mortgage approval model deployment - Data Science to see a list of transactions.
-
For any transaction, click Explain under the Actions column. Here you see the detailed explanation of this decision. You will see the most important inputs to the model along with how important each was to the end result. Blue bars represent inputs that tended to support the model's decision while red bars show inputs that might have led to another decision. For example, an applicant might have enough income to otherwise be approved but their poor credit history and high debt together lead the model to reject the application. Review this explanation to become satisfied about the basis for the model decision.
-
Optional: If you want to delve further into how the model made its decision, click the Inspect tab. Use the Inspect feature to analyze the decision to find areas of sensitivity where small changes to a few inputs would result in a different decision. You can test the sensitivity yourself by overriding some of the actual inputs with alternatives to see whether these would impact the result.
Check your progress
The following image shows the explainability of a transaction in Watson OpenScale. You have determined that the model is accurate and treating all applicants fairly. Now, you can advance the model to the next phase in its lifecycle.
Golden Bank's team used Orchestration Pipelines to create a data pipeline that delivers up-to-date data on all mortgage applicants and a machine learning model that lenders can use for decision making. Then, the team used Watson OpenScale to ensure that the model was treating all applicants fairly.
Cleanup (Optional)
If you would like to retake this tutorial, delete the following artifacts.
Artifact | How to delete |
---|---|
Mortgage approval model deployment - Data Science in the Mortgage approval - Data Science and MLOps deployment space | Delete a deployment |
Mortgage approval - Data Science and MLOps deployment space | Delete a deployment space |
Data Science and MLOps sample project | Delete a project |
Next steps
-
Try these tutorials:
-
Sign up for another use case.
Learn more
Parent topic: Use case tutorials