When you run advanced profiling on a data asset, a detailed frequency distribution is determined for the distinct values in each column of the asset based on the source data.
When you configure the settings for an advanced profiling run, you can choose to write all or part of the frequency distribution information to a database table. See Advanced data profiling. You can access this table by using standard database queries or the IBM Knowledge Catalog API or through the detailed column profile. However, the column profile will show only the first 100 distinct values regardless of how many values are actually stored.
For each distinct value, the table contains the following information:
Column name | Description |
---|---|
AssetId | The ID of the data asset in the project. |
ChangeDate | The date on which the information was updated. |
ColumnName | The name of column in the data asset. |
DataClassification | A list of IDs of the data classes assigned to the column in the data asset separated by comma (,). If no data class is assigned to the column, the table shows U . |
DistinctValue | The actual data value in the column. The maximum length in byte is 4096 or 2,048 characters for Unicode. All values are stored as strings irrespective of the actual data type. Thus, string sort order is applied when you sort the values in the detailed column profile. |
FrequencyCount | How often this value occurs. |
GeneralFormat | The format that represents the character pattern of a data value. Every alphabetic character is represented by an uppercase or lowercase letter A, depending on the capitalization of the character. Every numeric character is represented by the number 9. Spaces and special characters are shown as they appear. |
InferredDataType | The inferred data type, such as integer, string, or date. |
ProjectId | The ID of the project in which the analysis was run. |
PropertyLength | The length of a string field. |
PropertyPrecision | The total length of a numeric field. |
PropertyScale | The scale of a numeric value is the total length of the decimal component of a numeric field. |
These additional columns are reserved for internal use and are subject to change without notice:
- Class
- ChangedByUser
- DataClassificationStatusFlag
- DomainPattern
- DomainValueFlag
- DomainValueFlagDate
- DomainValueFlaggedByUser
- FieldNumber
- FormatFlag
- FormatFlagDate
- FormatFlaggedByUser
- InvalidReasonCode
- ODBCType
- SourceOfDistinctValue
- TypeCode
- TypeOfDomainValue
Learn more
- Advanced data profiling
- Column-level profile information
- IBM Knowledge Catalog API: Filter rows from the frequency distribution
Parent topic: Reviewing metadata enrichment results