The Tree-AS node is similar to the CHAID node; however, the Tree-AS node is
designed to process big data to create a single tree and displays the resulting model in the output
viewer. The node generates a decision tree by using chi-square statistics (CHAID) to identify
optimal splits. This use of CHAID can generate nonbinary trees, meaning that some splits have more
than two branches. Target and input fields can be numeric range (continuous) or categorical.
Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible
splits but takes longer to compute.
Table 1. treeas properties
treeas Properties
Values
Property description
target
field
In the Tree-AS node, CHAID models require a single target and one or more input fields. A
frequency field can also be specified. See Common modeling node properties for more information.
method
chaidexhaustive_chaid
max_depth
integer
Maximum tree depth, from 0 to 20. The default value is 5.
num_bins
integer
Only used if the data is made up of continuous inputs. Set the number of equal frequency bins
to be used for the inputs; options are: 2, 4, 5, 10, 20, 25, 50, or 100.
record_threshold
integer
The number of records at which the model will switch from using p-values to Effect sizes
while building the tree. The default is 1,000,000; increase or decrease this in increments of
10,000.
split_alpha
number
Significance level for splitting. The value must be between 0.01 and 0.99.
merge_alpha
number
Significance level for merging. The value must be between 0.01 and 0.99.
bonferroni_adjustment
flag
Adjust significance values using Bonferroni method.
effect_size_threshold_cont
number
Set the Effect size threshold when splitting nodes and merging categories when using a
continuous target. The value must be between 0.01 and 0.99.
effect_size_threshold_cat
number
Set the Effect size threshold when splitting nodes and merging categories when using a
categorical target. The value must be between 0.01 and 0.99.
split_merged_categories
flag
Allow resplitting of merged categories.
grouping_sig_level
number
Used to determine how groups of nodes are formed or how unusual nodes are identified.
chi_square
pearsonlikelihood_ratio
Method used to calculate the chi-square statistic: Pearson or Likelihood Ratio
minimum_record_use
use_percentageuse_absolute
min_parent_records_pc
number
Default value is 2. Minimum 1, maximum 100, in increments of 1. Parent branch value must be
higher than child branch.
min_child_records_pc
number
Default value is 1. Minimum 1, maximum 100, in increments of 1.
min_parent_records_abs
number
Default value is 100. Minimum 1, maximum 100, in increments of 1. Parent branch value must be
higher than child branch.
min_child_records_abs
number
Default value is 50. Minimum 1, maximum 100, in increments of 1.
epsilon
number
Minimum change in expected cell frequencies..
max_iterations
number
Maximum iterations for convergence.
use_costs
flag
costs
structured
Structured property. The format is a list of 3 values: the actual value, the predicted value,
and the cost if that prediction is wrong. For example:
tree.setPropertyValue("costs", [["drugA", "drugB", 3.0], ["drugX", "drugY",
4.0]])
default_cost_increase
nonelinearsquarecustom
Only enabled for ordinal targets.
Set default values in the costs matrix.
calculate_conf
flag
display_rule_id
flag
Adds a field in the scoring output that indicates the ID for the terminal node to which each
record is assigned.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.