At any time after you've added data to Data Refinery, you can validate
your data. Typically, you'll want to do this at multiple points in the refinement process.
To validate your data:
From Data Refinery, click the Profile tab.
Review the metrics
for each column.
Take appropriate actions, as described in the following sections, depending on what you learn.
Frequency
Copy link to section
Frequency is the number of times that a value, or a value in a specified range, occurs. Each frequency distribution (bar) shows the count of unique values in a column.
Review the frequency distribution to find anomalies in your data. If you want to cleanse your data of those anomalies, simply remove the values.
For Integer and Date/Time columns, you can customize the number of bins (groupings) that you want to see. In the default multi-column view, the maximum is 20. If you expand the frequency chart row, the maximum is 50.
Statistics
Copy link to section
Statistics are a collection of quantitative data. The statistics for each column show the minimum, maximum, mean, and number of unique values in that column.
Depending on a column's data type, the statistics for each column will vary slightly. For example, statistics for a column of data type integer have minimum, maximum, and mean values while statistics for a column of data type string have minimum
length, maximum length, and mean length values.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.