AutoAI tutorial: Build a binary classification model

This tutorial guides you through training a model to predict whether or not a customer is likely to buy a tent from an outdoor equipment store. In this tutorial, you will create an AutoAI experiment in IBM Watson Studio to build a model that analyzes your data and selects the best model type and algorithms to produce, train, and optimize pipelines, which are model candidates. After reviewing the pipelines, you will save one as a model, deploy it, then test it to get a prediction.

Watch this video to see a preview of the steps in this tutorial.

Figure 1. Video icon AutoAI tutorial video


  • Create a project in Watson Studio

Sample data

Download the sample training data file to your local computer from here: GoSales.csv external link

The sample data is structured: in rows and columns, and saved in a .csv file.

You can view the sample data file in a text editor or spreadsheet program:
Preview of training data

What do you want to predict?

You will be asked to choose the column label representing the values your model will predict.

In this tutorial, the label column is the IS_TENT column:

  • IS_TENT: Whether or not the customer bought a tent

The model built in this tutorial will predict whether a given customer is likely to purchase a tent.

Steps overview

This tutorial presents the basic steps for building and training a machine learning model using AutoAI:

  1. Build and train the model
  2. Deploy the trained model
  3. Test the deployed model

Step 1: Build and train the experiment

1.1 Specify basic experiment details

  1. From the Assets page of your project, click Add to project and choose AutoAI Experiment.
  2. In the page that opens, fill in the basic fields:
    • Specify a name and optional description for your new experiment.
    • Confirm that the IBM Watson Machine Learning service instance that you associated with your project is selected in the Machine Learning Service section.
  3. Click Create.

1.2 Add training data

Upload the training data file, GoSales.csv, from your local computer by dragging the file onto the data panel or by clicking browse and then following the prompts.

1.3 Train the model

  1. Choose IS_TENT as the column to predict. AutoAI analyzes your data and determines that the IS_TENT column contains True/False information, making this data suitable for a binary classification model. The default metric for a binary classification is ROC/AUC.
    Choosing a prediction column
  2. Click Run experiment. As the model trains, you will see an infographic that shows the process of building the pipelines.
    Building model pipelines

For a list of algorithms, or estimators, available with each machine learning technique in AutoAI, see: AutoAI implementation detail

1.4 Choose a pipeline

Once the pipeline creation is complete, you can view and compare the ranked pipelines in a leaderboard.

Tip: The pipelines for the sample binary classification model are very uniform because of the underlying sample data. To see pipelines in action, re-run the experiment as a regression experiment to predict purchase amount. That experiment gives you better variation in the resulting pipelines. For example:
Pipeline leaderboard

You can then click Pipeline comparison to see how they differ. For example:
Pipeline comparison go-sales-pipeline-comparison_icpd.png

Choose Save model from the action menu for Pipeline 1. This saves the pipeline as a Machine Learning asset in your project.

Step 2: Deploy the trained model

Before you can use your trained model to make predictions on new data, you must deploy the model.

You can deploy the model from the model details page. You can access the model details page in one of these ways:

  • Clicking on the model name in the notification displayed when you save the model.
  • Open the Assets page for the project containing the model and click the model name in the Machine Learning Model section.

From the model details page:

  • Click the Deployments tab.
  • Click Add Deployment.
  • In the page that opens, fill in the fields:
    • Specify a name for the deployment.
    • Select “Online” as the Deployment type.
    • Click Save.

After you save the deployment, click on the deployment name to view the deployment details page.

Step 3: Test the deployed model

You can test the deployed model from the deployment details page:

  1. On the Test tab of the deployment details page, either fill out the form with test values, or enter the following JSON test data.
        "values": [["M",27,"Single", "Professional","Camping Equipment",144.78]]

Note that the test data replicates the data fields for the model with the exception of the prediction field.

  1. Click Predict to predict whether a customer with the entered attributes is likely to buy a tent. The resulting prediction indicates that a customer with the attributes entered has a very high probability of purchasing a tent.
    Tent model prediction

Creating a batch job to score the model

For a batch deployment, you provide input data, also known as the model payload, in a CSV file. The data should be structured like the training data, with the same column headers. The batch job will process each row of data and create a corresponding prediction.

In a real scenario, you would submit new data to the model to get a score, but this tutorial will use the training data GoSales.csv that you downloaded as part of the tutorial setup to learn how to create and run a batch deployment. When you deploy a model, you can add the payload data to a project, upload it directly to a space, or link to the data in a storage repository such as a Cloud Object Storage bucket. In this case, you will upload the file directly to the deployment space.

From the Assets page of the deployment space:

  1. Click Add to space then choose Data
  2. Upload the file GoSales.csv file that you saved locally.

Step 2: Create the batch deployment

Now you can define the batch deployment.

  1. Click the deployment icon next to the model name.
  2. In the page that opens, fill in the fields:
    • Specify a name for the deployment.
    • Select “Batch” as the Deployment type.
    • Choose the smallest hardware specification.
    • Click Create.

Step 3: Create the batch job:

The batch job executes the deployment. To create the job you specify the input data and the name for the output file. You can set up a job to run on a schedule, or run immediately.

  1. Click Create job.
  2. Specify the input file: *GoSales.csv.csv*.
  3. Name the output file: GoSales-output
  4. Click Create and run to run the job immediately.

Step 4: View the output

When the deployment status changes to Deployed, return to the Assets page for the deployment space. You will see that the file GoSales-output.csv was created and added to your assets list.

Click the download icon next to the output file and open the file in an editor. You can review the prediction results for the customer information submitted for batch processing.

For each case, the prediction returned indicates the confidence score of whether a customer will buy a tent.