Model builder tutorial: Build a binary classifier model automatically
This tutorial guides you through training a model to predict whether or not a customer is likely to buy a tent from an outdoor equipment store. In this tutorial, you will use the model builder in IBM Watson Studio to build a model that uses the binary classification technique, automatically select the algorithm to implement that technique, and then train the model on sample historical customer data.
- Create a project in Watson Studio
Download the sample training data file to your local computer from here: GoSales.csv
The sample data is structured: in rows and columns, and saved in a .csv file.
You can view the sample data file in a text editor or spreadsheet program:
Feature columns are columns that contain the attributes on which the machine learning model will base predictions. In this historical data, there are four feature columns:
- GENDER: Customer gender
- AGE: Customer age
- MARITAL_STATUS: “Married”, “Single”, or “Unspecified”
- PROFESSION: General category of the customer’s profession, such “Hospitality” or “Sales”, or simply “Other”
Label columns are columns that contain historical outcomes.
In this tutorial, the label column is the IS_TENT column:
- IS_TENT: Whether or not the customer bought a tent
The model built in this tutorial will predict whether a given customer is likely to purchase a tent.
This tutorial presents the basic steps for building and training a machine learning model using model builder in Watson Studio:
Step 1: Build and train the model
1.1 Specify basic model details
From the Assets page of your project in Watson Studio, in the Models section, click New model.
In the page that opens, fill in the basic fields:
- Specify a name for your new model.
- Confirm that the IBM Watson Machine Learning service instance that you associated with your project is selected in the Machine Learning Service section.
Select "Model builder" as the model type.
Associate a Spark service with the model. (Follow instructions in the GUI to select an existing Spark service or create a new one.)
Click the card labeled Automatic. (This will cause model builder to automatically select an algorithm to implement the machine learning technique we specify.)
1.2 Add training data
Click Add Data Assets (this causes the data panel to open.)
In the data panel, click Load.
Upload the training data file, GoSales.csv, from your local computer by dragging the file onto the data panel or by clicking browse and then following the prompts.
After the upload completes, check the radio button beside the GoSales.csv entry in the data asset list.
1.3 Train the model
Specify the label column and feature columns:
- Label column: IS_TENT
- Feature columns: GENDER, AGE, MARITAL_STATUS, and PROFESSION
The label column is what the model will predict. Feature columns contain the attributes on which the model will base predictions.
Choose the machine learning technique:
- Binary classification
Click Next to begin training. (Training will take about a minute.)
After training completes, click Save.
After the model is saved, the model details page opens automatically.
Which specific algorithm did model builder choose?
In the Summary table of the Overview information on the model details page, you can view which estimator (algorithm) the model builder chose to use by clicking View in the “Model builder details” row.
For a list of estimators available with each machine learning technique in model builder, see: Model builder overview
Step 2: Deploy the trained model
Before you can use your trained model to make predictions on new data, you must deploy the model.
You can deploy the model from the model details page:
Click the Deployments tab.
Click Add Deployment.
In the page that opens, fill in the fields:
- Specify a name for the deployment.
- Select "Web service" as the Deployment type.
After you save the deployment, click on the deployment name to view the deployment details page.
Step 3: Test the deployed model
You can test the deployed model from the deployment details page:
In the Test area of the deployment details page, type in some values for the feature columns: GENDER, AGE, MARITAL_STATUS, and PROFESSION (you can ignore the other fields in the form.)
Click Predict to predict whether a customer with the entered attributes is likely to buy a tent.
Tip: Look in the training data file, GoSales.csv, for more examples of feature combinations.