About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Feb 11, 2025
If the majority of missing values are concentrated in a small number of fields, you can address them at the field level rather than at the record level. This approach also allows you to experiment with the relative importance of particular fields before deciding on an approach for handling missing values. If a field is unimportant in modeling, it probably isn't worth keeping, regardless of how many missing values it has.
For example, a market research company may collect data from a general questionnaire containing
50 questions. Two of the questions address age and political persuasion, information that many
people are reluctant to give. In this case,
and
Age
have many missing values.Political_persuasion
Field measurement level
In determining which method to use, you should also consider the measurement level of fields with missing values.
Numeric fields. For numeric field types, such as
, you should always eliminate any non-numeric values before building a
model, because many models won't function if blanks are included in numeric fields.Continuous
Categorical fields. For categorical fields, such as
and Nominal
, altering missing values isn't necessary but will
increase the accuracy of the model. For example, a model that uses the field Flag
will still function with meaningless values, such as Sex
and Y
, but
removing all values other than Z
and M
will increase the accuracy
of the model.F
Screening or removing fields
To screen out fields with too many missing values, you have several options:
- You can use a Data Audit node to filter fields based on quality
- You can use a Feature Selection node to screen out fields with more than a specified percentage of missing values and to rank fields based on importance relative to a specified target
- Instead of removing the fields, you can use a Type node to set the field role to None. This will keep the fields in the data set but exclude them from the modeling processes
Was the topic helpful?
0/1000