About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Feb 11, 2025
The Feature Selection node screens
input fields for removal based on a set of criteria (such as the percentage of missing values); it
then ranks the importance of remaining inputs relative to a specified target. For example, given a
data set with hundreds of potential inputs, which are most likely to be useful in modeling patient
outcomes?
Example
node = stream.create("featureselection", "My node") node.setPropertyValue("screen_single_category", True) node.setPropertyValue("max_single_category", 95) node.setPropertyValue("screen_missing_values", True) node.setPropertyValue("max_missing_values", 80) node.setPropertyValue("criteria", "Likelihood") node.setPropertyValue("unimportant_below", 0.8) node.setPropertyValue("important_above", 0.9) node.setPropertyValue("important_label", "Check Me Out!") node.setPropertyValue("selection_mode", "TopN") node.setPropertyValue("top_n", 15)
Properties |
Values | Property description |
---|---|---|
|
field | Feature Selection models rank predictors relative to the specified target. Weight and frequency fields are not used. See Common modeling node properties for more information. |
|
flag | If , screens fields that have too many records falling into the same
category relative to the total number of records. |
|
number | Specifies the threshold used when is
. |
|
flag | If , screens fields with too many missing values, expressed as a
percentage of the total number of records. |
|
number | |
|
flag | If , screens fields with too many categories relative to the total
number of records. |
|
number | |
|
flag | If , screens fields with a standard deviation of less than or equal to
the specified minimum. |
|
number | |
|
flag | If , screens fields with a coefficient of variance less than or equal to
the specified minimum. |
|
number | |
|
|
When ranking categorical predictors against a categorical target, specifies the measure on which the importance value is based. |
|
number | Specifies the threshold p values used to rank variables as important, marginal, or unimportant. Accepts values from 0.0 to 1.0. |
|
number | Accepts values from 0.0 to 1.0. |
|
string | Specifies the label for the unimportant ranking. |
|
string | |
|
string | |
|
|
|
|
flag | When is set to , specifies
whether to select important fields. |
|
flag | When is set to , specifies
whether to select marginal fields. |
|
flag | When is set to , specifies
whether to select unimportant fields. |
|
number | When is set to , specifies
the cutoff value to use. Accepts values from 0 to 100. |
|
integer | When is set to , specifies the cutoff
value to use. Accepts values from 0 to 1000. |
Was the topic helpful?
0/1000