Last updated: Jan 18, 2024
The Binning node automatically creates new nominal (set) fields based on the values of one or more existing continuous (numeric range) fields. For example, you can transform a continuous income field into a new categorical field containing groups of income as deviations from the mean. After you create bins for the new field, you can generate a Derive node based on the cut points.
Example
node = stream.create("binning", "My node")
node.setPropertyValue("fields", ["Na", "K"])
node.setPropertyValue("method", "Rank")
node.setPropertyValue("fixed_width_name_extension", "_binned")
node.setPropertyValue("fixed_width_add_as", "Suffix")
node.setPropertyValue("fixed_bin_method", "Count")
node.setPropertyValue("fixed_bin_count", 10)
node.setPropertyValue("fixed_bin_width", 3.5)
node.setPropertyValue("tile10", True)
binningnode properties |
Data type | Property description |
---|---|---|
fields
|
[field1 field2 ... fieldn] | Continuous (numeric range) fields pending transformation. You can bin multiple fields simultaneously. |
method
|
FixedWidth
EqualCount
Rank
SDev
Optimal
|
Method used for determining cut points for new field bins (categories). |
recalculate_bins
|
Always
IfNecessary
|
Specifies whether the bins are recalculated and the data placed in the relevant bin every time the node is executed, or that data is added only to existing bins and any new bins that have been added. |
fixed_width_name_extension
|
string | The default extension is _BIN. |
fixed_width_add_as
|
Suffix
Prefix
|
Specifies whether the extension is added to the end (suffix) of the field name or to the start (prefix). The default extension is income_BIN. |
fixed_bin_method
|
Width
Count
|
|
fixed_bin_count
|
integer | Specifies an integer used to determine the number of fixed-width bins (categories) for the new field(s). |
fixed_bin_width
|
real | Value (integer or real) for calculating width of the bin. |
equal_count_name_
extension
|
string | The default extension is _TILE. |
equal_count_add_as
|
Suffix
Prefix
|
Specifies an extension, either suffix or prefix, used for the field name generated by using standard p-tiles. The default extension is _TILE plus N, where N is the tile number. |
tile4
|
flag | Generates four quantile bins, each containing 25% of cases. |
tile5
|
flag | Generates five quintile bins. |
tile10
|
flag | Generates 10 decile bins. |
tile20
|
flag | Generates 20 vingtile bins. |
tile100
|
flag | Generates 100 percentile bins. |
use_custom_tile
|
flag | |
custom_tile_name_extension
|
string | The default extension is _TILEN. |
custom_tile_add_as
|
Suffix
Prefix
|
|
custom_tile
|
integer | |
equal_count_method
|
RecordCount
ValueSum
|
The RecordCount method seeks to assign an equal number of records to each
bin, while ValueSum assigns records so that the sum of the values in each bin is
equal. |
tied_values_method
|
Next
Current
Random
|
Specifies which bin tied value data is to be put in. |
rank_order
|
Ascending
Descending
|
This property includes Ascending (lowest value is marked 1) or
Descending (highest value is marked 1). |
rank_add_as
|
Suffix
Prefix
|
This option applies to rank, fractional rank, and percentage rank. |
rank
|
flag | |
rank_name_extension
|
string | The default extension is _RANK. |
rank_fractional
|
flag | Ranks cases where the value of the new field equals rank divided by the sum of the weights of the nonmissing cases. Fractional ranks fall in the range of 0–1. |
rank_fractional_name_
extension
|
string | The default extension is _F_RANK. |
rank_pct
|
flag | Each rank is divided by the number of records with valid values and multiplied by 100. Percentage fractional ranks fall in the range of 1–100. |
rank_pct_name_extension
|
string | The default extension is _P_RANK. |
sdev_name_extension
|
string | |
sdev_add_as
|
Suffix
Prefix
|
|
sdev_count
|
One
Two
Three
|
|
optimal_name_extension
|
string | The default extension is _OPTIMAL. |
optimal_add_as
|
Suffix
Prefix
|
|
optimal_supervisor_field
|
field | Field chosen as the supervisory field to which the fields selected for binning are related. |
optimal_merge_bins
|
flag | Specifies that any bins with small case counts will be added to a larger, neighboring bin. |
optimal_small_bin_threshold
|
integer | |
optimal_pre_bin
|
flag | Indicates that prebinning of dataset is to take place. |
optimal_max_bins
|
integer | Specifies an upper limit to avoid creating an inordinately large number of bins. |
optimal_lower_end_point
|
Inclusive
Exclusive
|
|
optimal_first_bin
|
Unbounded
Bounded
|
|
optimal_last_bin
|
Unbounded
Bounded
|