AutoAI tutorial: Build a binary classification model

This tutorial guides you through training a model to predict whether or not a customer is likely to buy a tent from an outdoor equipment store. In this tutorial, you will create an AutoAI experiment in IBM Watson Studio to build a model that analyzes your data and selects the best model type and algorithms to produce, train, and optimize pipelines, which are model candidates. After reviewing the pipelines, you will save one as a model, deploy it, then test it to get a prediction.

Preview

Watch this video to see a preview of the steps in this tutorial.

Figure 1. Video iconAutoAI tutorial video
This video shows you how to build a binary classification model using AutoAI.

Prerequisite

  • Create a project in Watson Studio

Sample data

Download the sample training data file to your local computer from here: GoSales.csv external link

The sample data is structured: in rows and columns, and saved in a .csv file.

You can view the sample data file in a text editor or spreadsheet program:
Preview of training data

Feature columns

Feature columns are columns that contain the attributes on which the machine learning model will base predictions. In this historical data, there are four feature columns:

  • GENDER: Customer gender
  • AGE: Customer age
  • MARITAL_STATUS: “Married”, “Single”, or “Unspecified”
  • PROFESSION: General category of the customer’s profession, such “Hospitality” or “Sales”, or simply “Other”

What do you want to predict?

You will be asked to choose the column label representing the values your model will predict.

In this tutorial, the label column is the IS_TENT column:

  • IS_TENT: Whether or not the customer bought a tent

The model built in this tutorial will predict whether a given customer is likely to purchase a tent.

Steps overview

This tutorial presents the basic steps for building and training a machine learning model using model builder in Watson Studio:

  1. Build and train the model
  2. Deploy the trained model
  3. Test the deployed model

Step 1: Build and train the model

1.1 Specify basic model details

  1. From the Assets page of your project in Watson Studio, click Add to project and choose AUTOAI EXPERIMENT.
  2. In the page that opens, fill in the basic fields:
    • Specify a name and optional description for your new model.
    • Confirm that the IBM Watson Machine Learning service instance that you associated with your project is selected in the Machine Learning Service section.
  3. Click Create.

1.2 Add training data

Upload the training data file, GoSales.csv, from your local computer by dragging the file onto the data panel or by clicking browse and then following the prompts.

1.3 Train the model

  1. Choose IS_TENT as the column to predict. AutoAI analyzes your data and determines that the IS_TENT column contains True/False information, making this data suitable for a binary classification model. The default metric for a binary classification is ROC/AUC.
    Choosing a prediction column
  2. Click Run experiment. As the model trains, you will see an infographic that shows the process of building the pipelines.
    Building model pipelines

For a list of algorithms, or estimators, available with each machine learning technique in AutoAI, see: AutoAI implementation detail

1.4 Choose a pipeline

Once the pipeline creation is complete, you can view and compare the ranked pipelines in a leaderboard.

Tip: The pipelines for the sample binary classification model are very uniform because of the underlying sample data. To see pipelines in action, re-run the experiment as a regression experiment to predict purchase amount. That experiment gives you better variation in the resulting pipelines. For example:
Pipeline leaderboard

You can then click Pipeline comparison to see how they differ. For example:
Pipeline comparison

Choose Save model from the action menu for Pipeline 1. This saves the pipeline as a Machine Learning asset in your project.

Step 2: Deploy the trained model

Before you can use your trained model to make predictions on new data, you must deploy the model.

You can deploy the model from the model details page. You can access the model details page in one of these ways:

  • Clicking on the model name in the notification displayed when you save the model.
  • Open the Assets page for the project containing the model and click the model name in the Machine Learning Model section.

From the model details page:

  • Click the Deployments tab.
  • Click Add Deployment.
  • In the page that opens, fill in the fields:
    • Specify a name for the deployment.
    • Select “Online” as the Deployment type.
    • Click Save.

After you save the deployment, click on the deployment name to view the deployment details page.

Step 3: Test the deployed model

You can test the deployed model from the deployment details page:

  1. On the Test tab of the deployment details page, either fill out the form with test values, or enter the following JSON test data.
{"input_data":[{
        "fields": ["GENDER","AGE","MARITAL_STATUS","PROFESSION","PRODUCT_LINE","PURCHASE_AMOUNT"],
        "values": [["M",27,"Single", "Professional","Camping Equipment",144.78]]
}]}

Note that the test data replicates the data fields for the model with the exception of the prediction field.

  1. Click Predict to predict whether a customer with the entered attributes is likely to buy a tent. The resulting prediction indicates that a customer with the attributes entered has a very high probability of purchasing a tent.
    Tent model prediction