About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Feb 11, 2025
The Binning node automatically creates new nominal (set) fields based on the
values of one or more existing continuous (numeric range) fields. For example, you can transform a
continuous income field into a new categorical field containing groups of income as deviations from
the mean. After you create bins for the new field, you can generate a Derive node based on the cut
points.
Example
node = stream.create("binning", "My node") node.setPropertyValue("fields", ["Na", "K"]) node.setPropertyValue("method", "Rank") node.setPropertyValue("fixed_width_name_extension", "_binned") node.setPropertyValue("fixed_width_add_as", "Suffix") node.setPropertyValue("fixed_bin_method", "Count") node.setPropertyValue("fixed_bin_count", 10) node.setPropertyValue("fixed_bin_width", 3.5) node.setPropertyValue("tile10", True)
properties |
Data type | Property description |
---|---|---|
|
[field1 field2 ... fieldn] | Continuous (numeric range) fields pending transformation. You can bin multiple fields simultaneously. |
|
|
Method used for determining cut points for new field bins (categories). |
|
|
Specifies whether the bins are recalculated and the data placed in the relevant bin every time the node is executed, or that data is added only to existing bins and any new bins that have been added. |
|
string | The default extension is _BIN. |
|
|
Specifies whether the extension is added to the end (suffix) of the field name or to the start (prefix). The default extension is income_BIN. |
|
|
|
|
integer | Specifies an integer used to determine the number of fixed-width bins (categories) for the new field(s). |
|
real | Value (integer or real) for calculating width of the bin. |
|
string | The default extension is _TILE. |
|
|
Specifies an extension, either suffix or prefix, used for the field name generated by using standard p-tiles. The default extension is _TILE plus N, where N is the tile number. |
|
flag | Generates four quantile bins, each containing 25% of cases. |
|
flag | Generates five quintile bins. |
|
flag | Generates 10 decile bins. |
|
flag | Generates 20 vingtile bins. |
|
flag | Generates 100 percentile bins. |
|
flag | |
|
string | The default extension is _TILEN. |
|
|
|
|
integer | |
|
|
The method seeks to assign an equal number of records to each
bin, while assigns records so that the sum of the values in each bin is
equal. |
|
|
Specifies which bin tied value data is to be put in. |
|
|
This property includes (lowest value is marked 1) or
(highest value is marked 1). |
|
|
This option applies to rank, fractional rank, and percentage rank. |
|
flag | |
|
string | The default extension is _RANK. |
|
flag | Ranks cases where the value of the new field equals rank divided by the sum of the weights of the nonmissing cases. Fractional ranks fall in the range of 0–1. |
|
string | The default extension is _F_RANK. |
|
flag | Each rank is divided by the number of records with valid values and multiplied by 100. Percentage fractional ranks fall in the range of 1–100. |
|
string | The default extension is _P_RANK. |
|
string | |
|
|
|
|
|
|
|
string | The default extension is _OPTIMAL. |
|
|
|
|
field | Field chosen as the supervisory field to which the fields selected for binning are related. |
|
flag | Specifies that any bins with small case counts will be added to a larger, neighboring bin. |
|
integer | |
|
flag | Indicates that prebinning of dataset is to take place. |
|
integer | Specifies an upper limit to avoid creating an inordinately large number of bins. |
|
|
|
|
|
|
|
|
Was the topic helpful?
0/1000