The Binning node enables you to automatically create new nominal fields based on the values of one or more existing continuous (numeric range) fields. For example, you can transform a continuous income field into a new categorical field containing income groups of equal width, or as deviations from the mean. Alternatively, you can select a categorical "supervisor" field in order to preserve the strength of the original association between the two fields.
Binning can be useful for a number of reasons, including:
- Algorithm requirements. Certain algorithms, such as Naive Bayes and Logistic Regression, require categorical inputs.
- Performance. Algorithms such as multinomial logistic may perform better if the number of distinct values of input fields is reduced. For example, use the median or mean value for each bin rather than using the original values.
- Data Privacy. Sensitive personal information, such as salaries, may be reported in ranges rather than actual salary figures in order to protect privacy.
A number of binning methods are available. Once you have created bins for the new field, you can generate a Derive node based on the cut points.
When to use a Binning node
Before using a Binning node, consider whether another technique is more appropriate for the task at hand:
Missing value handling
The Binning node handles missing values in the following ways:
- User-specified blanks. Missing values specified as blanks are included during the transformation. For example, if you designated –99 to indicate a blank value using the Type node, this value will be included in the binning process. To ignore blanks during binning, you should use a Filler node to replace the blank values with the system null value.
- System-missing values ($null$). Null values are ignored during the binning transformation and remain nulls after the transformation.
The Settings tab provides options for available techniques. The View tab displays cut points established for data previously run through the node.