The Random Trees node is similar to the C&RT Tree node; however, the Random
Trees node is designed to process big data to create a single tree. The Random Trees tree node
generates a decision tree that you use to predict or classify future observations. The method uses
recursive partitioning to split the training records into segments by minimizing the impurity at
each step, where a node in the tree is considered pure if 100% of cases in the node
fall into a specific category of the target field. Target and input fields can be numeric ranges or
categorical (nominal, ordinal, or flags); all splits are binary (only two subgroups).
Table 1. randomtrees properties
randomtrees Properties
Values
Property description
target
field
In the Random Trees node, models require a single target and one or more input fields. A
frequency field can also be specified. See Common modeling node properties for more information.
number_of_models
integer
Determines the number of models to build as part of the ensemble modeling.
use_number_of_predictors
flag
Determines whether number_of_predictors is used.
number_of_predictors
integer
Specifies the number of predictors to be used when building split models.
use_stop_rule_for_accuracy
flag
Determines whether model building stops when accuracy can't be improved.
sample_size
number
Reduce this value to improve performance when processing very large datasets.
handle_imbalanced_data
flag
If the target of the model is a particular flag outcome, and the ratio of the desired outcome
to a non-desired outcome is very small, then the data is imbalanced and the bootstrap sampling
that's conducted by the model may affect the model's accuracy. Enable imbalanced data handling so
that the model will capture a larger proportion of the desired outcome and generate a stronger
model.
use_weighted_sampling
flag
When False, variables for each node are randomly selected with the same
probability. When True, variables are weighted and selected accordingly.
max_node_number
integer
Maximum number of nodes allowed in individual trees. If the number would be exceeded on the
next split, tree growth halts.
max_depth
integer
Maximum tree depth before growth halts.
min_child_node_size
integer
Determines the minimum number of records allowed in a child node after the parent node is
split. If a child node would contain fewer records than specified here, the parent node won't be
split.
use_costs
flag
costs
structured
Structured property. The format is a list of 3 values: the actual value, the predicted value,
and the cost if that prediction is wrong. For example:
tree.setPropertyValue("costs", [["drugA", "drugB", 3.0], ["drugX", "drugY", 4.0]])
default_cost_increase
nonelinearsquarecustom
Note this is only enabled for ordinal targets.
Set default values in the costs
matrix.
max_pct_missing
integer
If the percentage of missing values in any input is greater than the value specified here,
the input is excluded. Minimum 0, maximum 100.
exclude_single_cat_pct
integer
If one category value represents a higher percentage of the records than specified here, the
entire field is excluded from model building. Minimum 1, maximum 99.
max_category_number
integer
If the number of categories in a field exceeds this value, the field is excluded from model
building. Minimum 2.
min_field_variation
number
If the coefficient of variation of a continuous field is smaller than this value, the field
is excluded from model building.
num_bins
integer
Only used if the data is made up of continuous inputs. Set the number of equal frequency bins
to be used for the inputs; options are: 2, 4, 5, 10, 20, 25, 50, or 100.
topN
integer
Specifies the number of rules to report. Default value is 50, with a minimum of 1 and a
maximum of 1000.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.