Classify telecommunications customers

Last updated: Feb 11, 2025

This tutorial builds a logistic regression model, which is a statistical technique for classifying records based on values of input fields. It is analogous to linear regression, but takes a categorical target field instead of a numeric one.

For example, suppose that a telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. If demographic data can be used to predict group membership, you can customize offers for individual prospective customers.

Preview the tutorial

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface that is shown in the video. The video is intended to be a companion to the written tutorial. This video provides a visual method to learn the concepts and tasks in this documentation.

Try the tutorial

In this tutorial, you will complete these tasks:

Task 1: Open the sample project
Task 2: Examine the Data Asset, Type and Filter nodes
Task 3: View the Logistic node
Task 4: Browse the model

Sample modeler flow and data set

This tutorial uses the Classifying Telecommunications Customer flow in the sample project. The data file used is telco.csv. The following image shows the sample modeler flow.

The following image shows the data set used with this modeler flow.

Figure 2. Sample data set

The example focuses on using demographic data to predict usage patterns. The target field custcat has four possible values that correspond to the four customer groups, as follows:

Table 1. Possible values for the target field
Value	Label
1	Basic Service
2	E-Service
3	Plus Service
4	Total Service

Because the target has multiple categories, a multinomial model is used. If the target has two distinct categories, such as yes/no, true/false, or churn/don't churn, a binomial model might be created instead.

Task 1: Open the sample project

The sample project contains several data sets and sample modeler flows. If you don't already have the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample project:

In Cloud Pak for Data, from the Navigation menu , choose Projects > View all Projects.
Click SPSS Modeler Project.
Click the Assets tab to see the data sets and modeler flows.

Check your progress

The following image shows the project Assets tab. You are now ready to work with the sample modeler flow associated with this tutorial.

Sample project

Back to the top

Task 2: Examine the Data Asset, Type and Filter nodes

Classifying Telecommunication Customers modeler flow includes several nodes. Follow these steps to examine three of the nodes:

From the Assets tab, open the Classifying Telecommunication Customers modeler flow, and wait for the canvas to load.
Double-click the telco.csv node. This node is a Data Asset node that points to the telco.csv file in the project.
Review the File format properties.
Optional: Click Preview data to see the full data set.
Double-click the Type node and click Read Values. This node specifies field properties, such as measurement level (the type of data that the field contains), and the role of each field as a target or input in modeling. Make sure that all measurement levels are set correctly. For example, most fields with values of 0.0 and 1.0 can be regarded as flags.

Figure 3. Measurement levels

Notice that gender is more correctly considered as a field with a set of two values, instead of a flag, so leave its measurement value as Nominal.
Set the role for the custcat field to Target. Leave the role for all other fields set to Input.
Double-click the Filter node to see its properties.
Notice that this node filters out only the relevant fields: region, age, marital, address, income, ed, employ, retire, gender, reside, and custcat). Other fields are excluded for this analysis.

Checkpoint icon Check your progress

The following image shows the Filter node. You are now ready to view the Logistic node.

Back to the top

Task 3: View the Logistic node

Follow these steps to classify customers by using multinomial logistic regression:

Double-click the custcat (Logistic) node to see its properties.
In the Model Settings section, select the Multinomial procedure.
- A Binomial model is used when the target field is a flag or nominal field with two discrete values.
- A Multinomial model is used when the target field is a nominal field with more than two values.
Next, select the Stepwise method and Main Effects model type. Also, select the Include constant in equation checkbox.

Figure 4. Logistic node Model Settings
In the Expert Options section, select Expert mode.
Click Output. Select Classification table, and click OK.

Figure 5. Logistic node Output options

Checkpoint icon Check your progress

The following image shows the Logistic node. You are now ready to browse the model.

Back to the top

Task 4: Browse the model

Follow these steps to browse the model:

Hover over the custcat (Logistic) node, and click the Run icon .
In the Outputs and models pane, click the custcat model to view the results.

Figure 6. Model Feature Importance chart

You can then explore the model information, feature (predictor) importance, and parameter estimates information.

These results are based on the training data only. To assess how well the model generalizes to other data in the real world, you can use a Partition node to hold out a subset of records for purposes of testing and validation.

Checkpoint icon Check your progress

Back to the top

Summary

This example showed you how to use demographic data to predict usage patterns by building a logistic regression model for classifying records based on values of input fields.