The Binning node enables you to automatically create new
nominal fields based on the values of one or more existing continuous (numeric range) fields. For
example, you can transform a continuous income field into a new categorical field containing income
groups of equal width, or as deviations from the mean. Alternatively, you can select a categorical
"supervisor" field in order to preserve the strength of the original association between the two
fields.
Binning can be useful for a number of reasons, including:
Algorithm requirements. Certain algorithms, such as
Naive Bayes and Logistic Regression, require categorical inputs.
Performance. Algorithms such as multinomial logistic
may perform better if the number of distinct values of input fields is reduced. For example, use the
median or mean value for each bin rather than using the original values.
Data Privacy. Sensitive personal information, such as
salaries, may be reported in ranges rather than actual salary figures in order to protect
privacy.
A number of binning methods are available. After you create bins for the new
field, you can generate a Derive node based on the cut points.
When to use a Binning node
Copy link to section
Before using a Binning node, consider whether another technique is more
appropriate for the task at hand:
To manually specify cut points for categories, such as specific predefined
salary ranges, use a Derive node. See Derive node for more
information.
To create new categories for existing sets, use a Reclassify node. See Reclassify node for more information.
Missing value handling
Copy link to section
The Binning node handles missing values in the following ways:
User-specified blanks. Missing values specified as
blanks are included during the transformation. For example, if you designated –99 to indicate a
blank value using the Type node, this value will be included in the binning process. To ignore
blanks during binning, you should use a Filler node to replace the blank values with the system null
value.
System-missing values ($null$). Null values are
ignored during the binning transformation and remain nulls after the transformation.
The Settings tab provides options for available techniques. The View tab
displays cut points established for data previously run through the node.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.