Auto Classifier node
The Auto Classifier node estimates and compares models for either nominal (set) or binary (yes/no) targets, using a number of different methods, enabling you to try out a variety of approaches in a single modeling run. You can select the algorithms to use, and experiment with multiple combinations of options. For example, rather than choose between Radial Basis Function, polynomial, sigmoid, or linear methods for an SVM, you can try them all. The node explores every possible combination of options, ranks each candidate model based on the measure you specify, and saves the best models for use in scoring or further analysis.
- A retail company has historical data tracking the offers made to specific customers in past campaigns. The company now wants to achieve more profitable results by matching the right offer to each customer.
- A target field with a measurement level of either
Flag(with the role set to Target), and at least one input field (with the role set to Input). For a flag field, the
Truevalue defined for the target is assumed to represent a hit when calculating profits, lift, and related statistics. Input fields can have a measurement level of
Categorical, with the limitation that some inputs may not be appropriate for some model types. For example, ordinal fields used as inputs in C&R Tree, CHAID, and QUEST models must have numeric storage (not string), and will be ignored by these models if specified otherwise. Similarly, continuous input fields can be binned in some cases. The requirements are the same as when using the individual modeling nodes; for example, a Bayes Net model works the same whether generated from the Bayes Net node or the Auto Classifier node.
- Frequency and weight fields
- Frequency and weight are used to give extra importance to some records over others because, for example, the user knows that the build dataset under-represents a section of the parent population (Weight) or because one record represents a number of identical cases (Frequency). If specified, a frequency field can be used by C&R Tree, CHAID, QUEST, Decision List, and Bayes Net models. A weight field can be used by C&RT, CHAID, and C5.0 models. Other model types will ignore these fields and build the models anyway. Frequency and weight fields are used only for model building, and are not considered when evaluating or scoring models.
- If you attach a table node to the nugget for the Auto Classifier Node, there are several new variables in the table with names that begin with a $ prefix.
- The names of the fields that are generated during scoring are based on the target field, but with a standard prefix. Different model types use different sets of prefixes.
- For example, the prefixes $G, $R, $C are used as the prefix for predictions that are generated by the Generalized Linear model, CHAID model, and C5.0 model, respectively. $X is typically generated by using an ensemble, and $XR, $XS, and $XF are used as prefixes in cases where the target field is a Continuous, Categorical, or Flag field, respectively.
- $..C prefixes are used for prediction confidence of a Categorical, or Flag target; for example, $XFC is used as a prefix for ensemble Flag prediction confidence. $RC and $CC are the prefixes for a single prediction of confidence for a CHAID model and C5.0 model respectively.
Supported Model Types
Supported model types include Neural Net, C&R Tree, QUEST, CHAID, C5.0, Logistic Regression, Decision List, Bayes Net, Discriminant, Nearest Neighbor, SVM, XGBoost Tree, and XGBoost-AS.