This tutorial builds two models to predict the effects of future sales promotions, and
then compares the models.
Similar to the Condition monitoring tutorial, the data
mining process consists of the exploration, data preparation, training, and test phases. Not all of
the data in the telco.csv data file are useful in predicting churn. You can use the
filter to select only data that is considered to be important for use as a predictor (the fields
marked as Important in the model).
Preview the tutorial
Copy link to section
Watch this video to preview the steps in this tutorial. There might
be slight differences in the user interface that is shown in the video. The video is intended to be
a companion to the written tutorial. This video provides a visual method to learn the concepts and
tasks in this documentation.
This tutorial uses the Retail Sales Promotion flow in the sample project. The data file
used is goods2n.csv. The following image shows the sample modeler flow.
Figure 1. Sample modeler flow
The following image shows the sample data set.Figure 2. Sample data set
Task 1: Open the sample project
Copy link to section
The sample project contains several data sets and sample modeler flows. If you don't already have
the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample
project:
In watsonx, from the Navigation menu, choose
Projects > View all Projects.
Click SPSS Modeler Project.
Click the Assets tab to see the data sets and modeler flows.
Check your progress
The following image shows the project Assets tab. You are now ready to work with the sample
modeler flow associated with this tutorial.
Task 2: Examine the Data Asset, Derive, and Type nodes
Copy link to section
Retail Sales Promotion includes several nodes. Follow these steps to examine the Data
Asset, Derive, and Type nodes:
Data Asset node
From the Assets tab, open the Retail Sales Promotion modeler flow,
and wait for the canvas to load.
Double-click the goods1n.csv node. This node is a Data Asset node that points to
the goods1n.csv file in the project.
Review the File format properties.
Click Preview data to see the full data set.
Notice that each record contains:
Class. Product type.
Cost. Unit price.
Promotion. Index of amount spent on a particular promotion.
Before. Revenue before promotion.
After. Revenue after promotion.
The two revenue fields (Before and After) are expressed in
absolute terms. However, it seems likely that the increase in revenue after the promotion (and
presumably as a result of it) might be a more useful figure.
Close the data preview and the properties side pane.
Derive node
Double-click the Increase (Derive) node. This node derives the value of the increase in
revenue.
Review the settings, in particular, the Expression field; which contains a formula to
derive the increase as a percentage of the revenue before the promotion: (After - Before) /
Before * 100.0.
Click Preview data to see the data set with the derived values.
Notice the Increase column.
For each class of product, and almost linear relationship
exists between the increase in revenue and the cost of the promotion. Therefore, it seems likely
that a decision tree or neural network could predict, with reasonable accuracy, the increase in
revenue from the other available fields.
Close the data preview and the properties side pane.
Type node
Double-click the Define Types (Type) node. This node specifies field properties, such as
measurement level (the type of data that the field contains), and the role of each field as a target
or input in modeling. The measurement level is a category that indicates the type of data in the
field. The source data file uses three different measurement levels:
A Continuous field (such as the Age
field) contains continuous numeric values.
A Nominal field (such as the Education field) has two or more distinct
values—in this case College or High school.
An Ordinal field (such as the Income level field) describes data with
multiple distinct values that have an inherent order—in this case Low,
Medium, and High.
For each
field, the Type node also specifies a role to indicate the part that each field plays in
modeling. The role is set to Target for the field Increase, which is the
field that was derived. The target is the field for which you want to predict the
value.
Role is set to Input for most other
fields. Input fields are sometimes known as predictors, or fields whose values are
used by the modeling algorithm to predict the value of the target field.
The role for the
After field is set to None, so this field is not used by the modeling
algorithm.
Optional: Click Preview data to see the data set with the derived
values.
Check your progress
The following image shows the Type node. You are now ready to generate and compare the
models.
The flow trains a neural network and a decision tree to make this prediction of revenue increase.
Follow these steps to generate the two models:
Generate the models
Double-click the Increase (Neural net) node to review its properties.
Expand the Basics section to see that the Multilayer Perceptron is the model type.
This property determines how the network connects the predictors to the targets through the hidden
layers. Multilayer perceptron allows for more complex relationships at the possible cost of
increasing the training and scoring time.
Expand the Model Options section to see the evaluation and scoring properties.
Double-click the Increase (C&R Tree) node to see its properties.
Click Run all , and
wait for the model nuggets to generate.
Compare the models
Connect the Increase (C&R Tree) model nugget to the Increase (Neural
net).
Add an Analysis node:
From the palette, expand the Outputs section.
Drag the Analysis node on to the canvas.
Connect the Increase (Neural net) model nugget to the Analysis node.
Change the data set to use different data for the analysis:
Double-click the goods1n.csv node to view its properties.
CV lick Change data set.
Navigate to Data asset > GOODS2n.csv.
Click Select.
Click Save.
Hover over the Analysis node, and click the Run icon .
In the Outputs and models pane, click the output with the name Analysis to view
the results.
From the Analysis output, in particular from the linear correlation between
the predicted increase and the correct answer, you see that the trained systems predict the increase
in revenue with a high degree of success.
Further exploration might focus on the cases where
the trained systems make relatively large errors. You might identify these errors by plotting the
predicted increase in revenue against the actual increase. You might then select outliers on a graph
by using the interactive graphics within SPSS Modeler, and from their
properties, it might be possible to tune the data description or learning process to improve
accuracy.
Check your progress
The following image shows the output from the Analysis node.
This example showed you how to predict the effects of future sales promotions. Similar to the
condition monitoring example, the data mining process consists
of the exploration, data preparation, training, and test phases.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.