0 / 0
chaidnode properties
Last updated: May 23, 2024
chaidnode properties

CHAID node iconThe CHAID node generates decision trees by using chi-square statistics to identify optimal splits. Unlike the C&R Tree and Quest nodes, CHAID can generate nonbinary trees, meaning that some splits have more than two branches. Target and input fields can be numeric range (continuous) or categorical. Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits but takes longer to compute.

Example

stream = modeler.script.stream()
sourcenode = stream.findByID("id46WRP1285C")

node = stream.createAt("chaid", "My node", 200, 100)
stream.link(sourcenode, node)

node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "Na", "K", "Cholesterol", "BP"])
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "CHAID")
node.setPropertyValue("method", "Chaid")
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", "Test")
node.setPropertyValue("split_alpha", 0.03)
node.setPropertyValue("merge_alpha", 0.04)
node.setPropertyValue("chi_square", "Pearson")
node.setPropertyValue("use_percentage", False)
node.setPropertyValue("min_parent_records_abs", 40)
node.setPropertyValue("min_child_records_abs", 30)
node.setPropertyValue("epsilon", 0.003)
node.setPropertyValue("max_iterations", 75)
node.setPropertyValue("split_merged_categories", True)
node.setPropertyValue("bonferroni_adjustment", True)
Table 1. chaidnode properties
chaidnode Properties Datatype or values Property description
target field CHAID models require a single target and one or more input fields. You can also specify a frequency. For more information, see Common modeling node properties.
continue_training_existing_model flag  
objective
  • Standard
  • Boosting
  • Bagging
  • psm
psm is used for large datasets, and requires a server connection.
model_output_type
  • Single
  • InteractiveBuilder
 
use_tree_directives flag  
tree_directives string  
method
  • Chaid
  • ExhaustiveChaid
 
use_max_depth
  • Default
  • Custom
 
max_depth integer Maximum tree depth, from 0 to 1000. Used only if use_max_depth = Custom.
use_percentage flag  
min_parent_records_pc number  
min_child_records_pc number  
min_parent_records_abs number  
min_child_records_abs number  
use_costs flag  
costs structured Structured property.
trails number Number of component models for boosting or bagging.
set_ensemble_method
  • Voting
  • HighestProbability
  • HighestMeanProbability
The default rule for combining categorical targets.
range_ensemble_method
  • Mean
  • Median
Default combining rule for continuous targets.
large_boost flag Applies boosting for large data sets.
split_alpha number Significance level for splitting.
merge_alpha number Significance level for merging.
bonferroni_adjustment flag Adjust significance values by using the Bonferroni method.
split_merged_categories flag Allow resplitting of merged categories.
chi_square
  • Pearson
  • LR
The method used to calculate the chi-square statistic: Pearson or Likelihood Ratio
epsilon number Minimum change in expected cell frequencies..
max_iterations number Maximum iterations for convergence.
set_random_seed integer  
seed number  
calculate_variable_importance flag  
calculate_raw_propensities flag  
calculate_adjusted_propensities flag  
adjusted_propensity_partition
  • Test
  • Validation
 
maximum_number_of_models integer  
train_pct double The algorithm internally separates records into a model building set and an overfit prevention set. The overfit prevention set is an independent set of data records used to track errors during training, which prevents the method from modeling chance variation in the data. Specify a percentage of records. The default is 30.
use_customize_layer Boolean The default value is false. You can set this property to true if you want to designate specific fields as points to split the decision tree at.
customize_layer list This property is used only when use_customize_layer is set to true.
This property is a list of objects. Each of the objects has two attributes:
  • Layer is an integer that indicates the specific n-th layer in the decision tree that you want to customize. In SPSS Modeler, layers start from 0 (root).
  • Fields is a list of names. Each name is one of the fields that you want the decision tree to potentially split on for that Layer. These fields are evaluated by SPSS Modeler in the order that they are listed.
When the SPSS Modeler flow runs, the CHAID algorithm evaluates and returns a candidate list of fields to split at based on the p value for each layer. For a custom layer, each field that you specified for the layer is compared to the full candidate list of fields. The first field to match a field from the candidate list is used for the split. The rest of the specified fields are ignored. If none of the fields match, a warning message appears and the tree splits as normal.