Last updated: Jan 18, 2024
The Auto Classifier node creates and compares a number of different models for binary outcomes (yes or no, churn or do not churn, and so on), allowing you to choose the best approach for a given analysis. A number of modeling algorithms are supported, making it possible to select the methods you want to use, the specific options for each, and the criteria for comparing the results. The node generates a set of models based on the specified options and ranks the best candidates according to the criteria you specify.
Example
node = stream.create("autoclassifier", "My node")
node.setPropertyValue("ranking_measure", "Accuracy")
node.setPropertyValue("ranking_dataset", "Training")
node.setPropertyValue("enable_accuracy_limit", True)
node.setPropertyValue("accuracy_limit", 0.9)
node.setPropertyValue("calculate_variable_importance", True)
node.setPropertyValue("use_costs", True)
node.setPropertyValue("svm", False)
autoclassifiernode Properties |
Values | Property description |
---|---|---|
target
|
field | For flag targets, the Auto Classifier node requires a single target and one or more input fields. Weight and frequency fields can also be specified. See Common modeling node properties for more information. |
ranking_measure
|
Accuracy
Area_under_curve
Profit
Lift
Num_variables
|
|
ranking_dataset
|
Training
Test
|
|
number_of_models
|
integer | Number of models to include in the model nugget. Specify an integer between 1 and 100. |
calculate_variable_importance
|
flag | |
enable_accuracy_limit
|
flag | |
accuracy_limit
|
integer | Integer between 0 and 100. |
enable_area_under_curve_limit
|
flag | |
area_under_curve_limit
|
number | Real number between 0.0 and 1.0. |
enable_profit_limit
|
flag | |
profit_limit
|
number | Integer greater than 0. |
enable_lift_limit
|
flag | |
lift_limit
|
number | Real number greater than 1.0. |
enable_number_of_variables_limit
|
flag | |
number_of_variables_limit
|
number | Integer greater than 0. |
use_fixed_cost
|
flag | |
fixed_cost
|
number | Real number greater than 0.0. |
variable_cost
|
field | |
use_fixed_revenue
|
flag | |
fixed_revenue
|
number | Real number greater than 0.0. |
variable_revenue
|
field | |
use_fixed_weight
|
flag | |
fixed_weight
|
number | Real number greater than 0.0 |
variable_weight
|
field | |
lift_percentile
|
number | Integer between 0 and 100. |
enable_model_build_time_limit
|
flag | |
model_build_time_limit
|
number | Integer set to the number of minutes to limit the time taken to build each individual model. |
enable_stop_after_time_limit
|
flag | |
stop_after_time_limit
|
number | Real number set to the number of hours to limit the overall elapsed time for an auto classifier run. |
enable_stop_after_valid_model_produced
|
flag | |
use_costs
|
flag | |
<algorithm>
|
flag | Enables or disables the use of a specific algorithm. |
<algorithm>.<property>
|
string | Sets a property value for a specific algorithm. See Setting algorithm properties for more information. |
use_cross_validation |
field | Fields added to this list can take either the condition or prediction role in rules that are generated by the model. This is on a rule by rule basis, so a field might be a condition in one rule and a prediction in another. |
number_of_folds |
integer | N fold parameter for cross validation, with range from 3 to 10. |
set_random_seed |
boolean | Setting a random seed allows you to replicate analyses. Specify an integer or click Generate, which will create a pseudo-random integer between 1 and 2147483647, inclusive. By default, analyses are replicated with seed 229176228. |
random_seed |
integer | Random seed |
stop_if_valid_model |
boolean | |
filter_individual_model_output |
boolean | Removes from the output all of the additional fields generated by the individual models that feed into the Ensemble node. Select this option if you're interested only in the combined score from all of the input models. Ensure that this option is deselected if, for example, you want to use an Analysis node or Evaluation node to compare the accuracy of the combined score with that of each of the individual input models |
set_ensemble_method |
"Voting"
|
Ensemble method for set targets. |
set_voting_tie_selection |
"Random" |
If voting is tied, select value randomly or by using highest confidence. |
flag_ensemble_method |
"Voting" |
Ensemble method for flag targets. |
flag_voting_tie_selection |
"Random" |
If voting is tied, select the value randomly, with highest confidence, or with raw propensity. |