Model builder tutorial: Build a multiclass classifier model
This tutorial guides you through training a model to predict which product category is most likely to interest a customer in an outdoor equipment store. In this tutorial, you will use the model builder in IBM Watson Studio to build a model that implements the multiclass classification technique using two different algorithms: Naive Bayes and random forest. You will use sample historical customer data to train the model.
Download the sample training data file to your local computer from here: GoSales.csv
The sample data is structured: in rows and columns, and saved in a .csv file.
You can view the sample data file in a text editor or spreadsheet program:
Feature columns are columns that contain the attributes on which the machine learning model will base predictions. In this historical data, there are four feature columns:
- GENDER: Customer gender
- AGE: Customer age
- MARITAL_STATUS: “Married”, “Single”, or “Unspecified”
- PROFESSION: General category of the customer’s profession, such “Hospitality” or “Sales”, or simply “Other”
Label columns are columns that contain historical outcomes.
In this tutorial, the label column is the PRODUCT_LINE column:
- PRODUCT_LINE: The product category in which the customer has been most interested
The model built in this tutorial will predict which product line is most likely to interest a given customer.
This tutorial presents the basic steps for building and training a machine learning model using model builder in Watson Studio:
Step 1: Build and train the model
1.1 Specify basic model details
From the Assets page of your project in Watson Studio, in the Models section, click New model.
In the page that opens, fill in the basic fields:
- Specify a name for your new model.
- Confirm that the IBM Watson Machine Learning service instance that you associated with your project is selected in the Machine Learning Service section.
Select "Model builder" as the model type.
Associate a Spark service with the model. (Follow instructions in the GUI to select an existing Spark service or create a new one.)
Click the card labeled Manual. (In this tutorial, you will choose the specific algorithms the model uses.)
1.2 Add training data
Click Add Data Assets (this causes the data panel to open.)
In the data panel, click Load.
Upload the training data file, GoSales.csv, from your local computer by dragging the file onto the data panel or by clicking browse and then following the prompts. (If you already uploaded the file GoSales.csv for another tutorial, and you can see the file already listed in the data asset list, you don't have to upload the file again now.)
After the upload completes, check the radio button beside the GoSales.csv entry in the data asset list.
1.3 Train the model
Specify the label column and feature columns:
- Label column: PRODUCT_LINE
- Feature columns: GENDER, AGE, MARITAL_STATUS, and PROFESSION
The label column is what the model will predict. Feature columns contain the attributes on which the model will base predictions.
Choose the machine learning technique:
- Multiclass classification
Add two estimators (algorithm choices) to compare:
- Click Add Estimators to view the estimators (algorithms) that are available to use with the multiclass classification technique in model builder.
- Click the card labeled Naive Bayes and then click Add.
- Click Add Estimators again.
- Click the card labeled Random Forest Classifier and then click Add.
Click Next to begin training two versions of the model. (Training will take a few minutes.)
Compare the training results of the two algorithms.
After training completes, you can see evaluations of both algorithm choices. (Model builder reserves some of the training data, doesn't use it to train the model, and then uses that reserved data to evaluate how well the model gets the correct answer.)
Notice that the performance of the model version that uses Naive Bayes is rated as "Poor", and the performance of the version that uses random forest classification is "Excellent":
To find the best solution for a given machine learning problem, you sometimes have to experiment with your training data, the model design, or the algorithms used.
In the model builder, you can easily compare the results of different algorithm.
Check the radio button beside the model version that uses the random forest classification algorithm and then click Save.
After the model is saved, the model details page opens automatically.
Step 2: Deploy the trained model
Before you can use your trained model to make predictions on new data, you must deploy the model.
You can deploy the model from the model details page:
Click the Deployments tab.
Click Add Deployment.
In the page that opens, fill in the fields:
- Specify a name for the deployment.
- Select "Web service" as the Deployment type.
After you save the deployment, click on the deployment name to view the deployment details page.
Step 3: Test the deployed model
You can test the deployed model from the deployment details page:
In the Test area of the deployment details page, type in some values for the feature columns:
Click Predict to predict which product category is most likely to interest the customer with the entered attributes.
Tip: Look in the training data file, GoSales.csv, for more examples of feature combinations.