Last updated: Jan 18, 2024
The Classification and Regression (C&R) Tree node generates a decision tree that allows you to predict or classify future observations. The method uses recursive partitioning to split the training records into segments by minimizing the impurity at each step, where a node in the tree is considered "pure" if 100% of cases in the node fall into a specific category of the target field. Target and input fields can be numeric ranges or categorical (nominal, ordinal, or flags); all splits are binary (only two subgroups).
Example
node = stream.createAt("cart", "My node", 200, 100)
# "Fields" tab
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "BP", "Cholesterol"])
# "Build Options" tab, "Objective" panel
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", """Grow Node Index 0 Children 1 2
Grow Node Index 2 Children 3 4""")
# "Build Options" tab, "Basics" panel
node.setPropertyValue("prune_tree", False)
node.setPropertyValue("use_std_err_rule", True)
node.setPropertyValue("std_err_multiplier", 3.0)
node.setPropertyValue("max_surrogates", 7)
# "Build Options" tab, "Stopping Rules" panel
node.setPropertyValue("use_percentage", True)
node.setPropertyValue("min_parent_records_pc", 5)
node.setPropertyValue("min_child_records_pc", 3)
# "Build Options" tab, "Advanced" panel
node.setPropertyValue("min_impurity", 0.0003)
node.setPropertyValue("impurity_measure", "Twoing")
# "Model Options" tab
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "Cart_Drug")
cartnode Properties |
Values | Property description |
---|---|---|
target
|
field | C&R Tree models require a single target and one or more input fields. A frequency field can also be specified. See the topic Common modeling node properties for more information. |
continue_training_existing_model
|
flag | |
objective
|
Standard
Boosting
Bagging
psm
|
psm is used for very large datasets, and requires
a Server connection. |
model_output_type
|
Single
InteractiveBuilder
|
|
use_tree_directives
|
flag | |
tree_directives
|
string | Specify directives for growing the tree. Directives can be wrapped in triple quotes to avoid escaping newlines or quotes. Note that directives may be highly sensitive to minor changes in data or modeling options and may not generalize to other datasets. |
use_max_depth
|
Default
Custom
|
|
max_depth
|
integer | Maximum
tree depth, from 0 to 1000. Used only if use_max_depth
= Custom . |
prune_tree
|
flag | Prune tree to avoid overfitting. |
use_std_err
|
flag | Use maximum difference in risk (in Standard Errors). |
std_err_multiplier
|
number | Maximum difference. |
max_surrogates
|
number | Maximum surrogates. |
use_percentage
|
flag | |
min_parent_records_pc
|
number | |
min_child_records_pc
|
number | |
min_parent_records_abs
|
number | |
min_child_records_abs
|
number | |
use_costs
|
flag | |
costs
|
structured | Structured property. |
priors
|
Data
Equal
Custom
|
|
custom_priors
|
structured | Structured property. |
adjust_priors
|
flag | |
trails
|
number | Number of component models for boosting or bagging. |
set_ensemble_method
|
Voting
HighestProbability
HighestMeanProbability
|
Default combining rule for categorical targets. |
range_ensemble_method
|
Mean
Median
|
Default combining rule for continuous targets. |
large_boost
|
flag | Apply boosting to very large data sets. |
min_impurity
|
number | |
impurity_measure
|
Gini
Twoing
Ordered
|
|
train_pct
|
number | Overfit prevention set. |
set_random_seed
|
flag | Replicate results option. |
seed
|
number | |
calculate_variable_importance
|
flag | |
calculate_raw_propensities
|
flag | |
calculate_adjusted_propensities
|
flag | |
adjusted_propensity_partition
|
Test
Validation
|