Tutorial: Using a predictive model from Watson Machine Learning with streaming patient data

Providing medical care in today’s fast-paced healthcare environment includes prescribing medications for complex conditions. If you’re a healthcare provider, you might have a good idea which drug is best for each patient. You consider the patient’s age, vital signs, and lab results. But you don’t have to rely on intuition. With IBM Watson Studio, you can know with certainty what the best drug is based on predictive models that used real data.

Learning Objective

In this tutorial, you learn how to leverage the predictive modeling capability with the streaming data capability to answer the question “Which drug is best for this patient”.

In Watson Machine Learning, you create and edit an example SPSS modeler flow. Using that modeler, you build a predictive model of which drug is the most effective for different health metrics. The model also includes the confidence level of the likelihood that the drug is prescribed.

Your streams flow ingests streaming patient metrics. Using those metrics, the predictive model tells us which drug would be most effective for each patient and the likelihood that the drug is prescribed.

Table of Contents

Quick overview

A typical medical office maintains health metrics of its patients. These metrics include sex, age, blood pressure (BP), and blood levels of sodium (Na), potassium (K), and cholesterol.

Watson Machine Learning can help practitioners answer the question of which medication is most effective for the patient.

Before we examine this question, let’s take a bird’s eye view of how we set up and integrate machine learning and streaming data:

First, we check that we have a running instance of Watson Machine Learning, and that we have Write access to it.

Next, we use Watson Machine Learning to train a machine learning modeler to tell us which drug is the most effective for each patient’s health metrics. We use data from a CSV file to train our model. We take that modeler and save it to a predictive model.

Finally, we create a streams flow. The streams flow ingests sample data of health metrics that we provide, inputs it into the newly trained model, and then outputs the most effective drug to the Debug operator for viewing.

Preview

Watch this video to see how to set up and integrate Watson Machine Learning and a streams flow.

Figure 1. Video iconSet up and integrate Watson Machine Learning and a streams flow
This video demonstrates how to set up and integrate Watson Machine Learning and a streams flow

Prerequisite

A Streaming Analytics service instance must be associated with the project where the machine learning flow and the streams flow run.

Do the following steps:

  1. Go to the Projects page of the project, and then click the Settings tab.

  2. In the Associated Services section of the page, check that a Streaming Analytics service is listed. If a service is not listed, do the following steps:

    a. Click Add service, and then select Streaming Analytics in the list.

    b. In the Existing Service Instance list, select the service to associate with this project.

  3. If no service is listed in the Existing Service Instance list, then you must provision an instance in your account in IBM Cloud Dashboard. Do the following steps:

    a. Go to the Projects page of the project, and then click the Settings tab.

    b. In the Associated Services section of the page, check that a Streaming Analytics service is listed. If a service is not listed, do the following steps:

    c. Click Add service, and then select Streaming Analytics in the list.

    d. In the Existing Service Instance list, select the service to associate with this project.

    e. Click Create resource.

    f. In the empty field, type Machine Learning.

    g. Click Machine Learning to open the Machine Learning page. Follow the prompts.

    h. Return to the Settings tab in your project, and then associate the newly provisioned instance with the project.

    i. Click Create resource.

    j. In the Search field, type Machine Learning.

    k. Click Machine Learning to open the Machine Learning page. Follow the prompts.

    l. Return to the Settings tab in your project, and then associate the newly provisioned instance with the project.

    ***  

Set up a Watson Machine Learning service in a project

You need a Watson Machine Learning service that is associated with the project. You must have Writer access to the service so that you can create and edit a machine learning model.

To set up a service, do the following steps:

  1. In IBM Watson Studio, click Projects, and then select the project to contain your modeler flow, model, and streams flow.

  2. In the Projects page of the project, click Settings > Associated services. If a Machine Learning service is not listed, click Add Service, and then select Watson. In the Machine Learning card, click Add. Follow the prompts to create a new service. Let’s call the new service predictive-modeling-demo.

  ***  

Use a Watson Machine Learning SPSS modeler flow to build a predictive model

In Watson Machine Learning, you build a predictive model by using an example modeler flow. The predictive model type is SPSS.

We supply the training data that includes health metrics such as medication, sex, age, blood pressure, and the blood levels of Na, K, and cholesterol. The training data indicates which drug is the most effective for certain health metrics. For example, when the age is X and the BP is Y and the cholesterol is Z, then drug A is the best to use.

The machine learning algorithm that you use is the C5 algorithm. In the training data, the C5 algorithm finds patterns that map the health metrics to the target (the most effective drug to use) and build a predictive model. The new model captures those patterns in a decision tree.

To build the SPSS modeler flow, do the following steps:

  1. In Watson Studio, return to the Projects page of your project (from step 1 in Set up a Watson Machine Learning service). Click Add to Project, and then select Modeler flow in the list.

  2. In the Modeler page, click From example > Drug Study Example modeler, and then Create. The canvas opens.

    Here’s how the Drug Study Example modeler flow looks in the canvas. Modeler flow in canvas

    The modeler flow nodes do the following tasks.

    Node number What the node does
    1 Reads training data from a CSV file.
    For example, it teaches the model that when the age is X and the BP is Y and the cholesterol is Z, then drug A is the drug of choice.
    2 Computes the ratio of Na to K, and adds the column “Na_to_K” for the ratio.
    3 Removes the “Na” and “K” columns from the training data.
    4 Defines metadata and values for missing data. The column “Drug” is defined as the target.
    5 Reads the training data and constructs a predictive decision tree by using the C5 algorithm with each split.

 


 

To build the predictive model from the modeler flow, do the following steps:

  1. Right-click Drug (node 5), and then select Open to open its Properties pane. Node 5 constructs a predictive decision tree from a set of training data by using the C5 algorithm. Check out the various parameters to see details of the decision tree, and then click Cancel.

  2. Right-click Drug again, and then select Run to start the flow. Node 6, Drug, is created. Node 6 holds the model that was built by node 5. Two new columns ($C-Drug and $CC-Drug) are added. Note the new link between node 4 and node 6. Data from node 4 now passes to the new Drug node (node 6) rather than through the Drug node (node 5).

    New modeler node in canvas

  3. Let’s check out our new model. Do the following steps:

    a. Right-click the new Drug node (node 6), and then select Open.

    In the Settings area, you see that our model calculates confidence levels. The confidence level gives us the likelihood that the drug is selected as the drug of choice by health practitioners.

    b. Click Cancel to return to the canvas.

    c. Right-click the new Drug node (node 6) again, and then select View Model to examine the C5 tree model that the modeler created.

    • Click Predictor Importance. This bar chart represents the predictors in descending order of relative importance for predicting the target, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that their sum is one.

      You see that the Na_to_K ratio is the most important predictor of which drug is effective, followed by BP, and then patient age or cholesterol.

    • Click Tree Diagram, and then select the Display labels on branches check box. The tree diagram shows the decision tree of which drug would be most beneficial.

      Decision tree

      The first split is by the Na_to_K ratio. If the ratio is greater than 14.829, then drugY is the best choice, and no other health metric influences that decision. Otherwise, drugX is the best choice. In this case, blood pressure comes into play and the choices are now drugX and drugA.

      The second split is by BP, and this split is on data where Na_to_K ratio is less than or equal to 14.829. If the blood pressure is normal, drugX is the most effective, and no other health metric influences that decision. Otherwise, age and cholesterol affect which drug is most beneficial.

      To see details about a branch, hover your mouse pointer over a branch to display the split field and split value. For example, the split field might be BP with a split value of HIGH.

      To see details about a node, hover your mouse pointer over the node. The node ID, the score for records in that node based on the model, the confidence, and other details are displayed.

  4. In the breadcrumbs above the C5.0 Tree Model page, click Drug Study Example to return to the canvas of our modeler flow.

  5. Right-click the new node Drug (node 6), and then select Preview to see what our data looks like.

    Notice that you have two new columns: “$C-Drug” and “$CC-Drug”. These variables were created by the modeler when you selected to include confidence levels in step 3a and when you defined the column “Drug” as a target. Notice that the values in columns “Drug” and “$C-Drug” are identical.

    These variables will give us the drug name and the level of confidence in its efficacy for a set of health metrics.

    Model preview

  6. In the breadcrumbs above the C5.0 Tree Model page, click Drug Study Example to return to the canvas of our modeler flow.

  7. In the modeler flow canvas, drag the Table node (from Outputs in the canvas palette) to the canvas, and then link it to the new Drug operator.

    Model flow in canvas

  8. Right-click the Table node (node 7), and then select Run. In the Outputs tab, double-click Table (8 fields, 200 records).
    It looks like the data is now giving us the information that we need. Let’s go back to our modeler flow to save this flow as our new model.

  9. In the breadcrumbs above the C5.0 Tree Model page, click Drug Study Example to return to the canvas of our modeler flow.

  10. Right-click the Table node (node 7), and then select Save Branch as a Model. When you save a model, you save all nodes in the branch from DRUG1n to Table.

  11. Save the model with the name drug model. If you created a Machine Learning service in the Setup part of this tutorial, then select predictive-modeling-demo as the Machine Learning service. Otherwise, select the Machine Learning service that is assigned to the project. Click Save.

You now have a new predictive model that indicates what the best drug is for each patient. Let’s open our new model to check its input schema. You need to know what columns (or attributes) the model expects because you need to set up our streams flow to deliver those columns to the model.

  1. In the Projects page of your project, click Assets > Models > Watson Machine Learning models > drug model. Our new model opens and shows us information about the model itself and its input schema.

  2. Scroll down to the Input Schema area, and then check the column names and data types. This information tells us what the model expects as input. The schema can be viewed in table format and in JSON format.

   

Now that you’ve built a predictive model and checked its input schemas, let’s leave Watson Machine Learning and turn to your streams flow.

Create a streams flow that uses the predictive model

Our last step is to create a streams flow that ingests streaming patient data in the column format that our SPSS Model operator expects. The SPSS Model operator uses the model that you created from the modeler branch. The streaming data goes through the same nodes of the modeler branch.

We supply the patient data by using a Code (in Sources) operator. From the Code operator, the data is sent to the SPSS Model operator to run the predictive model that you created in Watson Machine Learning. The SPSS Model operator applies the predictive analytics of the model to the incoming patient data to determine the best drug for each patient. The output from the SPSS Model operator is sent to the Debug operator so that you can view the data without storing it.

To create a streams flow that uses the predictive model, do the following steps:

  1. Go to the Project page, click Add to Project, and then click Streams Flow.

  2. Create a streams flow manually. Let’s call the streams flow Drug Study Example.

  3. In the canvas palette, drag the operators Code (as Sources), SPSS Model (as Processing and Analytics), and Debug (as Targets) to the canvas. Connect them.

    New streams flow

  4. Click the Code operator to open its Properties pane, and then do the following steps:

    a. In the Code field, add the following code.

     {: codeblock}
    
     # YOU MUST EDIT THE SCHEMA and add all attributes that you are returning as output.
     #
     # Preinstalled Python packages can be viewed from the Settings pane.
     # In the Settings pane you can also install additional Python packages.
    
     import sys
     import time
     import pandas as pd
    
     # init() function will be called once on pipeline initialization
     # @state a Python dictionary object for keeping state. The state object is passed to the produce function
     def init(state):
     # do something once on pipeline initialization and save in the state object pass
    
     # produce() function will be called when the job starts to run.
     # It is called on a background thread, and it will typically invoke the 'submit()' callback
     # whenever a tuple of data is ready to be emitted from this operator.
     # This allows for using asynchronous data services as well as synchronous data generation or retrieval.
     # @submit a Python callback function that takes one argument: a dictionary representing a single tuple.
     # @state a Python dictionary object for keeping state
     # You must declare all output attributes in the Edit Schema window.
    
     def produce(submit, state):
         df = pd.read_csv('https://raw.githubusercontent.com/pmservice/drug-selection/master/data/drug_batch_data.csv', sep=',')
         while True:
             for i, AGE in enumerate(df.AGE):
                 event = { "Age" : df.AGE.iloc[i],
                        "Sex" : df.SEX.iloc[i],
                        "BP" : df.BP.iloc[i],
                        "Cholesterol" : df.CHOLESTEROL.iloc[i],
                        "Na" : df.NA.iloc[i],
                        "K" : df.K.iloc[i]}
                           
                 submit(event)
                 time.sleep(0.5) # Simulates a delay of 0.5 seconds between emitted events
    

    b. Click Edit Output Schema, and then add the column names that you saw in the input schemas of our new model step 11. Make sure that the columns and their data types match what the model expects as input.

    Note: Our incoming streaming data does not have a column for “Drug”, so you need to add it. Our model will put the predicted most effective drug in that column.

  5. Click the SPSS Model operator to open its Properties pane, and then do the following steps:

    a. In Machine Learning Instance, select the instance and list its models. If you created a Machine Learning service in the Setup part of this tutorial, then select predictive-modeling-demo. Otherwise, select the Machine Learning service that is assigned to the project.

    b. In SPSS Model, select drug model.

    c. Open the Schema section. Click Edit to open the Schema window. Note that the attributes of the output schema are mapped automatically to the attributes of the SPSS model drug model. Also note that the model attributes $C-Drug and $CC-Drug are also automatically mapped to new attribute names in the output schema.

    The $C-Drug column will hold the name of the drug that is predicted to be most effective. The $CC-Drug column will hold the likelihood of the drug being prescribed.

You don’t need to configure the Debug operator. It simply acts as a target so that you can view tuples from the SPSS Model operator.

Here’s what our streams flow looks like now:

SPSS Streams flow in canvas

Everything looks good, so let’s run the streams flow. In the taskbar of the canvas, click Run icon to save and start the streams flow.

Note the ingestion rate into the streams flow, and the flow of events between the Code operator and the SPSS Model operator. You can see the events in table format or in JSON format.

Important

The SPSS Model operator in the streams flow runs the predictive model that you created in Watson Machine Learning. The predictive model is not retrained as data streams from the Code operator to the SPSS Model operator. You must refresh the predictive model in Watson Machine Learning from time to time, and then rerun your streams flow with the updated model.


   

Summary

You’ve learned how to leverage the predictive modeling capability of Watson Machine Learning with a flow of streaming data.

In Watson Machine Learning, you created and configured a Watson Machine Learning service, and then used an example SPSS modeler flow to create a predictive model. You created a streams flow that ingests patient metric data, inputs the data to the predictive model, and then outputs the analytic results to the Debug operator for viewing.

   

Learn more