This tutorial builds two models to predict the effects of future sales promotions, and
then compares the models.
Similar to the Condition monitoring tutorial, the data
mining process consists of the exploration, data preparation, training, and test phases. Not all of
the data in the telco.csv data file are useful in predicting churn. You can use the
filter to select only data that is considered to be important for use as a predictor (the fields
marked as Important in the model).
Preview the tutorial
Watch this video to preview the steps in this tutorial. There might
be slight differences in the user interface that is shown in the video. The video is intended to be
a companion to the written tutorial. This video provides a visual method to learn the concepts and
tasks in this documentation.
This tutorial uses the Retail Sales Promotion flow in the sample project. The data file
used is goods2n.csv. The following image shows the sample modeler flow.
Figure 1. Sample modeler flow
The following image shows the sample data set.Figure 2. Sample data set
Task 1: Open the sample project
The sample project contains several data sets and sample modeler flows. If you don't already have
the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample
project:
In Cloud Pak for Data, from the Navigation menu, choose
Projects > View all Projects.
Click SPSS Modeler Project.
Click the Assets tab to see the data sets and modeler flows.
Check your progress
The following image shows the project Assets tab. You are now ready to work with the sample
modeler flow associated with this tutorial.
Task 2: Examine the Data Asset, Derive, and Type nodes
Retail Sales Promotion includes several nodes. Follow these steps to examine the Data
Asset, Derive, and Type nodes:
Data Asset node
From the Assets tab, open the Retail Sales Promotion modeler flow,
and wait for the canvas to load.
Double-click the goods1n.csv node. This node is a Data Asset node that points to
the goods1n.csv file in the project.
Review the File format properties.
Click Preview data to see the full data set.
Notice that each record contains:
Class. Product type.
Cost. Unit price.
Promotion. Index of amount spent on a particular promotion.
Before. Revenue before promotion.
After. Revenue after promotion.
The two revenue fields (Before and After) are expressed in
absolute terms. However, it seems likely that the increase in revenue after the promotion (and
presumably as a result of it) might be a more useful figure.
Close the data preview and the properties side pane.
Derive node
Double-click the Increase (Derive) node. This node derives the value of the increase in
revenue.
Review the settings, in particular, the Expression field; which contains a formula to
derive the increase as a percentage of the revenue before the promotion: (After - Before) /
Before * 100.0.
Click Preview data to see the data set with the derived values.
Notice the Increase column.
For each class of product, and almost linear relationship
exists between the increase in revenue and the cost of the promotion. Therefore, it seems likely
that a decision tree or neural network could predict, with reasonable accuracy, the increase in
revenue from the other available fields.
Close the data preview and the properties side pane.
Type node
Double-click the Define Types (Type) node. This node specifies field properties, such as
measurement level (the type of data that the field contains), and the role of each field as a target
or input in modeling. The measurement level is a category that indicates the type of data in the
field. The source data file uses three different measurement levels:
A Continuous field (such as the Age
field) contains continuous numeric values.
A Nominal field (such as the Education field) has two or more distinct
values—in this case College or High school.
An Ordinal field (such as the Income level field) describes data with
multiple distinct values that have an inherent order—in this case Low,
Medium, and High.
For each
field, the Type node also specifies a role to indicate the part that each field plays in
modeling. The role is set to Target for the field Increase, which is the
field that was derived. The target is the field for which you want to predict the
value.
Role is set to Input for most other
fields. Input fields are sometimes known as predictors, or fields whose values are
used by the modeling algorithm to predict the value of the target field.
The role for the
After field is set to None, so this field is not used by the modeling
algorithm.
Optional: Click Preview data to see the data set with the derived
values.
Check your progress
The following image shows the Type node. You are now ready to generate and compare the
models.
The flow trains a neural network and a decision tree to make this prediction of revenue increase.
Follow these steps to generate the two models:
Generate the models
Double-click the Increase (Neural net) node to review its properties.
Expand the Basics section to see that the Multilayer Perceptron is the model type.
This property determines how the network connects the predictors to the targets through the hidden
layers. Multilayer perceptron allows for more complex relationships at the possible cost of
increasing the training and scoring time.
Expand the Model Options section to see the evaluation and scoring properties.
Double-click the Increase (C&R Tree) node to see its properties.
Click Run all , and
wait for the model nuggets to generate.
Compare the models
Connect the Increase (C&R Tree) model nugget to the Increase (Neural
net).
Add an Analysis node:
From the palette, expand the Outputs section.
Drag the Analysis node on to the canvas.
Connect the Increase (Neural net) model nugget to the Analysis node.
Change the data set to use different data for the analysis:
Double-click the goods1n.csv node to view its properties.
CV lick Change data set.
Navigate to Data asset > GOODS2n.csv.
Click Select.
Click Save.
Hover over the Analysis node, and click the Run icon .
In the Outputs and models pane, click the output with the name Analysis to view
the results.
From the Analysis output, in particular from the linear correlation between
the predicted increase and the correct answer, you see that the trained systems predict the increase
in revenue with a high degree of success.
Further exploration might focus on the cases where
the trained systems make relatively large errors. You might identify these errors by plotting the
predicted increase in revenue against the actual increase. You might then select outliers on a graph
by using the interactive graphics within SPSS Modeler, and from their
properties, it might be possible to tune the data description or learning process to improve
accuracy.
Check your progress
The following image shows the output from the Analysis node.
This example showed you how to predict the effects of future sales promotions. Similar to the
condition monitoring example, the data mining process consists
of the exploration, data preparation, training, and test phases.
Use this interactive map to learn about the relationships between your tasks, the tools you need, the services that provide the tools, and where you use the tools.
Select any task, tool, service, or workspace
You'll learn what you need, how to get it, and where to use it.
Tasks you'll do
Some tasks have a choice of tools and services.
Tools you'll use
Some tools perform the same tasks but have different features and levels of automation.
Create a notebook in which you run Python, R, or Scala code to prepare, visualize, and analyze data, or build a model.
Automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
Create a visual flow that uses modeling algorithms to prepare data and build and train a model, using a guided approach to machine learning that doesn’t require coding.
Create and manage scenarios to find the best solution to your optimization problem by comparing different combinations of your model, data, and solutions.
Create a flow of ordered operations to cleanse and shape data. Visualize data to identify problems and discover insights.
Automate the model lifecycle, including preparing data, training models, and creating deployments.
Work with R notebooks and scripts in an integrated development environment.
Create a federated learning experiment to train a common model on a set of remote data sources. Share training results without sharing data.
Deploy and run your data science and AI solutions in a test or production environment.
Find and share your data and other assets.
Import asset metadata from a connection into a project or a catalog.
Enrich imported asset metadata with business context, data profiling, and quality assessment.
Measure and monitor the quality of your data.
Create and run masking flows to prepare copies of data assets that are masked by advanced data protection rules.
Create your business vocabulary to enrich assets and rules to protect data.
Track data movement and usage for transparency and determining data accuracy.
Track AI models from request to production.
Create a flow with a set of connectors and stages to transform and integrate data. Provide enriched and tailored information for your enterprise.
Create a virtual table to segment or combine data from one or more tables.
Measure outcomes from your AI models and help ensure the fairness, explainability, and compliance of all your models.
Replicate data to target systems with low latency, transactional integrity and optimized data capture.
Consolidate data from the disparate sources that fuel your business and establish a single, trusted, 360-degree view of your customers.
Services you can use
Services add features and tools to the platform.
Develop powerful AI solutions with an integrated collaborative studio and industry-standard APIs and SDKs. Formerly known as Watson Studio.
Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability. Formerly known as Watson Machine Learning.
Discover, profile, catalog, and share trusted data in your organization.
Create ETL and data pipeline services for real-time, micro-batch, and batch data orchestration.
View, access, manipulate, and analyze your data without moving it.
Monitor your AI models for bias, fairness, and trust with added transparency on how your AI models make decisions.
Provide efficient change data capture and near real-time data delivery with transactional integrity.
Improve trust in AI pipelines by identifying duplicate records and providing reliable data about your customers, suppliers, or partners.
Increase data pipeline transparency so you can determine data accuracy throughout your models and systems.
Where you'll work
Collaborative workspaces contain tools for specific tasks.
Where you work with data.
> Projects > View all projects
Where you find and share assets.
> Catalogs > View all catalogs
Where you deploy and run assets that are ready for testing or production.
> Deployments
Where you manage governance artifacts.
> Governance > Categories
Where you virtualize data.
> Data > Data virtualization
Where you consolidate data into a 360 degree view.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.