About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Feb 11, 2025
You can use Balance nodes to correct imbalances in datasets so they conform to specified test criteria.
For example, suppose that a dataset has only two values--
or low
--and that 90% of the cases are high
while only 10% of the
cases are low
. Many modeling techniques have trouble with such biased data because
they will tend to learn only the low outcome and ignore the high one, since it is more
rare. If the data is well balanced with approximately equal numbers of high
and
low
outcomes, models will have a better chance of finding patterns that
distinguish the two groups. In this case, a Balance node is useful for creating a balancing
directive that reduces cases with a low outcome. high
Balancing is carried out by duplicating and then discarding records based on the conditions you specify. Records for which no condition holds are always passed through. Because this process works by duplicating and/or discarding records, the original sequence of your data is lost in downstream operations. Be sure to derive any sequence-related values before adding a Balance node to the data stream.