This tutorial builds a logistic regression model, which is a statistical technique for
classifying records based on values of input fields. It is analogous to linear regression, but takes
a categorical target field instead of a numeric one.
For example, suppose that a telecommunications provider has segmented its customer base by
service usage patterns, categorizing the customers into four groups. If demographic data can be used
to predict group membership, you can customize offers for individual prospective customers.
Preview the tutorial
Copy link to section
Watch this video to preview the steps in this tutorial. There might
be slight differences in the user interface that is shown in the video. The video is intended to be
a companion to the written tutorial. This video provides a visual method to learn the concepts and
tasks in this documentation.
This tutorial uses the Classifying Telecommunications Customer flow in the sample project.
The data file used is telco.csv. The following image shows the sample modeler flow.
Figure 1. Sample modeler flow
The following image shows the data set used with this modeler flow.
Figure 2. Sample data set
The example focuses on using demographic data to predict usage patterns. The target field
custcat has four possible values that correspond to the four customer groups, as
follows:
Table 1. Possible values for
the target field
Value
Label
1
Basic Service
2
E-Service
3
Plus Service
4
Total Service
Because the target has multiple categories, a multinomial model is used. If the target has two
distinct categories, such as yes/no, true/false, or churn/don't churn, a binomial model might be
created instead.
Task 1: Open the sample project
Copy link to section
The sample project contains several data sets and sample modeler flows. If you don't already have
the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample
project:
In watsonx, from the Navigation menu, choose
Projects > View all Projects.
Click SPSS Modeler Project.
Click the Assets tab to see the data sets and modeler flows.
Check your progress
The following image shows the project Assets tab. You are now ready to work with the sample
modeler flow associated with this tutorial.
Task 2: Examine the Data Asset, Type and Filter nodes
Copy link to section
Classifying Telecommunication Customers modeler flow includes several nodes. Follow these
steps to examine three of the nodes:
From the Assets tab, open the Classifying Telecommunication
Customers modeler flow, and wait for the canvas to load.
Double-click the telco.csv node. This node is a Data Asset node that points to the
telco.csv file in the project.
Review the File format properties.
Optional: Click Preview data to see the full data set.
Double-click the Type node and click Read Values. This node specifies field
properties, such as measurement level (the type of data that the field contains), and the role of
each field as a target or input in modeling. Make sure that all measurement levels are set
correctly. For example, most fields with values of 0.0 and 1.0 can
be regarded as flags.
Figure 3. Measurement levels
Notice that gender is more correctly considered as a field with a
set of two values, instead of a flag, so leave its measurement value as Nominal.
Set the role for the custcat field to Target. Leave the role for all
other fields set to Input.
Double-click the Filter node to see its properties.
Notice that this node filters out only the relevant fields: region,
age, marital, address, income,
ed, employ, retire, gender,
reside, and custcat). Other fields are excluded for this analysis.
Check your progress
The following image shows the Filter node. You are now ready to view the Logistic
node.
Hover over the custcat (Logistic) node, and click the Run icon .
In the Outputs and models pane, click the custcat model to view the results.
Figure 6. Model Feature Importance chart
You can then explore the model information, feature (predictor) importance, and parameter
estimates information.
These results are based on the training data only. To assess how well the model generalizes to
other data in the real world, you can use a Partition node to hold out a subset of records
for purposes of testing and validation.
This example showed you how to use demographic data to predict usage patterns by building a
logistic regression model for classifying records based on values of input fields.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.