Model builder tutorial: Build a regression model
This tutorial guides you through training a model to predict the amount of money a customer is likely to spend on a trip to an outdoor equipment store. In this tutorial, you will use the model builder in IBM Watson Studio to build a model that uses the regression technique with the gradient boosted tree regression algorithm, and then train the model on sample historical customer data
Download the sample training data file to your local computer from here: GoSales.csv
The sample data is structured: in rows and columns, and saved in a .csv file.
You can view the sample data file in a text editor or spreadsheet program:
Feature columns are columns that contain the attributes on which the machine learning model will base predictions. In this historical data, there are four feature columns:
- GENDER: Customer gender
- AGE: Customer age
- MARITAL_STATUS: "Married", "Single", or "Unspecified"
- PROFESSION: General category of the customer's profession, such "Hospitality" or "Sales", or simply "Other"
Label columns are columns that contain historical outcomes.
In this tutorial, the label column is the PURCHASE_AMOUNT column:
- PURCHASE_AMOUNT: The average amount of money the customer has spent on each visit to the store
The model built in this tutorial will predict how much money a given customer is likely to spend on a visit to the store.
This tutorial presents the basic steps for building and training a machine learning model using model builder in Watson Studio:
Step 1: Build and train the model
1.1 Specify basic model details
From the Assets page of your project in Watson Studio, in the Models section, click New model.
In the page that opens, fill in the basic fields:
- Specify a name for your new model.
- Confirm that the IBM Watson Machine Learning service instance that you associated with your project is selected in the Machine Learning Service section.
Select "Model builder" as the model type.
Associate a Spark service with the model. (Follow instructions in the GUI to select an existing Spark service or create a new one.)
Click the card labeled Manual.
In manual mode, you choose the specific algorithms the model uses.
1.2 Add training data
Click Add Data Assets (this causes the data panel to open.)
In the data panel, click Load.
Upload the training data file, GoSales.csv, from your local computer by dragging the file onto the data panel or by clicking browse and then following the prompts. (If you already uploaded the file GoSales.csv for another tutorial, and you can see the file already listed in the data asset list, you don't have to upload the file again now.)
After the upload completes, check the radio button beside the GoSales.csv entry in the data asset list.
1.3 Train the model
Specify the label column and feature columns:
- Label column: PURCHASE_AMOUNT
- Feature columns: GENDER, AGE, MARITAL_STATUS, and PROFESSION
The label column is what the model will predict. Feature columns contain the attributes on which the model will base predictions.
Choose the machine learning technique:
Specify an estimator (algorithm):
- Click Add Estimators to view the estimators (algorithms) that are available to use with the regression technique in model builder.
- Click the card labeled Gradient Boosted Tree Regressor and then click Add.
Click Next to begin training. (Training will take about a minute.)
After training completes, click Save.
After the model is saved, the model details page opens automatically.
Step 2: Deploy the trained model
Before you can use your trained model to make predictions on new data, you must deploy the model.
You can deploy the model from the model details page:
Click the Deployments tab.
Click Add Deployment.
In the page that opens, fill in the fields:
- Specify a name for the deployment.
- Select "Web service" as the Deployment type.
After you save the deployment, click on the deployment name to view the deployment details page.
Step 3: Test the deployed model
You can test the deployed model from the deployment details page:
In the Test area of the deployment details page, type in some values for the feature columns: GENDER, AGE, MARITAL_STATUS, and PROFESSION (you can ignore the other fields in the form.)
Click Predict to predict how much money a customer with the entered attributes is likely to spend on a trip to the store.
Tip: Look in the training data file, GoSales.csv, for more examples of feature combinations.