Tutorials demonstrating the model builder in Watson Studio

This topic lists tutorials that demonstrate features of the model builder in IBM Watson Studio. These tutorials all use the same sample data set to train different types of machine learning models: a binary classifier, a milticlass classifier, and a regression model. No coding is required to complete any of these tutorials. You can complete any one of these tutorials in less than 20 minutes.


About model builder

The model builder in IBM Watson Studio is a graphical tool that guides you, step by step, through building a machine learning model: uploading training data, choosing a machine learning technique and algorithms, and training and evaluating the model.

In model builder, you can create three kinds of machine learning model:

  • Binary classifier: Classifies data into two categories
  • Multiclass classifier: Classifies data into multiple categories
  • Regression: Predict a value from a continuous set of values

Algorithms (estimators)
For each kind of model, you can choose from multiple algorithms to implement the technique.

Automatic or manual algorithm selection</br> Algorithms can be chosen in two ways in model builder:

  • Automatic: An algorithm is chosen automatically, based on characteristics of the the training data
  • Manual: You choose one or more algorithms

For an overview of model builder, including a description of the available techniques and algorithms, see: Model builder overview


About the sample data

These tutorials all use the same sample data to train the machine learning models: historical customer data for a fictional outdoor equipment store.

You can download the sample training data from here: GoSales.csv external link

The sample data is structured: in rows and columns, and saved in a .csv file.

You can view the sample data file in a text editor or spreadsheet program:

Preview of training data

Feature columns
Feature columns are columns that contain the attributes on which the machine learning model will base predictions. In this historical data, there are four feature columns:

  • GENDER: Customer gender
  • AGE: Customer age
  • MARITAL_STATUS: “Married”, “Single”, or “Unspecified”
  • PROFESSION: General category of the customer’s profession, such “Hospitality” or “Sales”, or simply “Other”

All of these tutorials use all four of these feature columns.

Label columns
Label columns are columns that contain historical outcomes that the models will be trained predict. In this historical data, there are three label columns:

  • IS_TENT: Whether or not the customer bought a tent
  • PRODUCT_LINE: The product category in which the customer has been most interested
  • PURCHASE_AMOUNT: The average amount of money the customer has spent on each visit to the store

Each tutorial uses a different label column.


About the tutorials

All of these tutorials demonstrate the same basic steps for building, training, deploying, and testing a model using model builder in Watson Studio:

  1. Upload training data
  2. Choose the machine learning technique and algorithms
  3. Train and evaluate the model
  4. Deploy the model to IBM Cloud
  5. Serve API calls (use the deployed model to make predictions)

No coding is required to complete any of these tutorials. You can complete any one of these tutorials in less than 20 minutes.

Table 1 lists the tutorials.

Table 1. Model builder tutorials
Tutorial Algorithm selection Technique Algorithms
Model builder binary classification tutorial Automatic Binary classification: Predict whether or not a customer is likely to purchase a tent. The model builder chooses the algorithm based on characteristics of the training data.
Model builder regression tutorial Manual Regression: Predict how much money a customer might spend on a trip to the store. Gradient boosted tree regression
Model builder multiclass classification tutorial Manual Multiclass classification: Predict which product category is most likely to interest a customer. Compare results for two algorithms:
  • Naive Bayes
  • Random forest classification