Building the flow (SPSS Modeler) | IBM Data Product Exchange

Building the flow

Last updated: Oct 09, 2024

Building the flow (SPSS Modeler)

To build a flow that will create a model, we need at least three elements:

A Data Asset node that reads in data from an external source, in this case a .csv data file
An Import or Type node that specifies field properties, such as measurement level (the type of data that the field contains), and the role of each field as a target or input in modeling
A modeling node that generates a model nugget when the flow runs

In this example, we're using a CHAID modeling node. CHAID, or Chi-squared Automatic Interaction Detection, is a classification method that builds decision trees by using a particular type of statistics known as chi-square statistics to work out the best places to make the splits in the decision tree.

If measurement levels are specified in the source node, the separate Type node can be eliminated. Functionally, the result is the same.

This flow also has Table and Analysis nodes that will be used to view the scoring results after the model nugget has been created and added to the flow.

The Data Asset import node reads data in from the sample tree_credit.csv data file.

The Type node specifies the measurement level for each field. The measurement level is a category that indicates the type of data in the field. Our source data file uses three different measurement levels:

A Continuous field (such as the Age field) contains continuous numeric values, while a Nominal field (such as the Credit rating field) has two or more distinct values, for example Bad, Good, or No credit history. An Ordinal field (such as the Income level field) describes data with multiple distinct values that have an inherent order—in this case Low, Medium and High.

Figure 2. Setting the target and input fields with the Type node

For each field, the Type node also specifies a role to indicate the part that each field plays in modeling. The role is set to Target for the field Credit rating, which is the field that indicates whether or not a given customer defaulted on the loan. This is the target, or the field for which we want to predict the value.

Role is set to Input for the other fields. Input fields are sometimes known as predictors, or fields whose values are used by the modeling algorithm to predict the value of the target field.

The CHAID modeling node generates the model. In the node's properties, under FIELDS, the option Use custom field roles is available. We could select this option and change the field roles, but for this example we'll use the default targets and inputs as specified in the Type node.

Double-click the CHAID node (named Creditrating). The node properties are displayed.
Figure 3. CHAID modeling node properties

Here there are several options where we could specify the kind of model we want to build.

We want a brand-new model, so under OBJECTIVES we'll use the default option Build new model.

We also just want a single, standard decision tree model without any enhancements, so we'll also use the default objective option Create a standard model.

Figure 4. CHAID modeling node objectives

For this example, we want to keep the tree fairly simple, so we'll limit the tree growth by raising the minimum number of cases for parent and child nodes.
Under STOPPING RULES, select Use absolute value.
Set Minimum records in parent branch to 400.
Set Minimum records in child branch to 200.

Figure 5. Setting the stopping criteria for decision tree building

We can use all the other default options for this example, so click Save and then click the Run button on the toolbar to create the model. (Alternatively, right-click the CHAID node and choose Run from the context menu.)