Columnlevel profile information
Each profile contains several levels of information.
The information is grouped as follows:
Statistics
The Statistics tab provides a summary of the structure of the analyzed data in a column and different types of visualizations for that structural information. What information exactly is shown depends on whether the column contains continuous (quantitative) or nominal (qualitative) data.
Charts
Depending on the type of data in a column, you can choose between different types of visualizations:

Nominal data:
 Bar chart
 Proportion or pie chart
 Pareto chart

Continuous data:
 Histogram chart
 Box plot chart
 Quantilequantile (QQ) plot chart
A distribution chart is available for all types of data. The distribution table usually lists at least the most frequent values (or intervals) in the column and their counts. The table might show other information such as the formats, types, or data classes. To view the individual rows that contain a certain value, click Show rows.
On the bar or histogram charts, you have the option to select an overlay column to see how its values are distributed within each value of the column that you are currently looking at. For example, if you have column with sold baked goods and select an overlay column season, you can see how sales of a certain bakery product differ per season. For the overlay column, you can pick from all columns in the data asset that contain nominal data.
Summary
The Summary tile provides general information about the data in the selected column:
 The data type of the column as defined in the data source
 The data type that was inferred through analysis
 The number of different data formats in that column
 The most frequent inferred format for that column
 The assigned data class
 The type of data measurement (
nominal
orcontinuous
)  The number of rows (that is, the number of values) that were checked
Basic statistics
Basic statistics provide general information about the distribution and dispersion of the values in the selected column. Depending on a column’s data format, the statistics vary slightly. For example, statistics for a column of data type integer have minimum, maximum, and mean values while statistics for a column of data type string have minimum length, maximum length, and mean length values.
Measure  Description  Shown for this type of data 

Cardinality  The percentage of unique distinct values in the column including blanks and nulls. It is calculated by dividing total number of distinct values in a column by the total number of values in that column.  Continuous 
Distinct  The number of different values that exist in the sampled data for the column.  Continuous 
Entropy  This value quantifies how much information the column holds. More generally, entropy can be used to quantify the information in an event and a random variable. This amount is estimated not only based on the number of different values that are present in the variable but also by the amount of unexpected values.  Nominal 
Gini  The degree of probability that a specific element is incorrectly classified when chosen randomly and a variation of the Gini coefficient. The Gini index can vary from 0 to 1, where 0 indicates that all the elements belong to a certain class or that only one class exists there. A Gini index of 1 indicates that all elements are randomly distributed across various classes. A value of 0.5 indicates that the elements are uniformly distributed across some classes  Nominal 
Maximum  The largest value of a numeric variable  Continuous 
Mean  The arithmetic average, the sum divided by the number of values  Continuous 
Median  The value above and below which half of the values fall. If there is an even number of values, the median is the average of the two middle values when they are sorted. The median is not affected by outliers  Continuous 
Minimum  The smallest value of a numeric variable  Continuous 
Missing  The number of rows in the sample that don't have a value.  Continuous Nominal 
Mode  The most frequently occurring value in the column. If several values occur with equal frequency, each of them is a mode.  Continuous Nominal 
Outliers  The number of values in the column data that are far away from most other values in the column.  Continuous 
Range  The difference between the maximum and minimum values in the column.  Continuous 
Sum  The sum or total of the values, across all columns that have values.  Continuous 
Unique  The number of distinct values that appear only once in the current column.  Continuous Nominal 
Valid  The number of values that are considered valid, which means empty or missing column values are excluded.  Continuous Nominal 
Advanced insights
Indepth information about the distribution and dispersion of the values in the selected column. This information is shown only for continuous data:
Measure  Description 

25th percentile  The value below which 25% and above which 75% of the detected values fall. 
75th percentile  The value above which 25% and below which 75% of the detected values fall. 
Kurtosis  A measure of the extent to which there are outliers (tailedness of a distribution). Excess kurtosis is the tailedness of a distribution relative to a normal distribution. For a normal distribution, the value of the kurtosis
statistic is zero. Positive kurtosis indicates that the data exhibit more extreme outliers than a normal distribution. Negative kurtosis indicates that the data exhibit less extreme outliers than a normal distribution. Distributions with medium kurtosis (medium tails) are mesokurtic. Distributions with low kurtosis (thin tails) are platykurtic. 
Mean std. error  A measure of how far the sample mean (average) of the data is likely to be from the true population mean. 
Std. deviation  A measure of dispersion around the mean. With a low standard deviation, values are usually close to the mean. With a high standard deviation, the range of values is wider. 
Skewness  A measure of the asymmetry of a distribution. A distribution is asymmetrical when its left and right sides are not mirror images. A distribution can have right (or positive), left (or negative), or zero skewness (symmetric distribution). 
Variance  A measure of dispersion around the mean. It's the expectation of the squared deviation of a random variable from its population mean or sample mean. 
Data classes
The following information is shown for data class assignments:

The selected data class, which is the data class assigned to the column. It is the same as the detected data class unless you manually changed it.

The detected data class, which is the best matching data class for the column as detected by the analysis.

The confidence score of the assigned data class. The confidence of a data class is the percentage of nonnull values that match the data class. Several data classes are more generic identifiers that are detected and assigned at a column level. These data classes are assigned when a more specific data class could not be identified at a value level. Generic identifiers will always have a confidence of 100% and include the following data classes: Code, Date, Identifier, Indicator, Quantity, and Text.

A list of all data classes that were detected during analysis in descending order, with the best match (the highest confidence) at the top. For each data class, the confidence score and the data class priority are shown.

For each detected data class, additional information might be shown depending on the scope of the data class.
For data classes where the matching is done based on column data, column values that matched the criteria for this specific data class are listed. The Count (%) column shows how many rows in the sample contain a specific value and the percentage of rows with that value. In addition, the format of each matching value is shown.
For data classes where the matching is done based on the column name and for the generic data classes Code, Date, Identifier, Indicator, Quantity, and Text no additional information is shown. These data classes are used when the data values don't allow for identifying a specific data class. The generic data classes always have a confidence of 100%.
For more information, see Data classes.
Formats
The format inferred for the column, the number of detected formats, and a list of all detected formats is shown.
A format represents the character pattern of a data value. Every alphabetic character is represented by an uppercase or lowercase letter A, depending on the capitalization of the character. Every numeric character is represented by the number 9. Spaces and special characters are shown as they appear.
The list of detected formats shows how many values with a specific format were found and the overall percentage of values with that format. Click an entry to see the values that match the pattern. Note that only 100 values are retrieved for display so that the value list might not contain all values or might even be empty.
Types
Following information is shown:
 The data type of the column as defined in the data source
 The data type that was inferred through analysis
 The minimum length of a value in that column
 The maximum length of a value in that column
 The average length of column values
 A list of all data types in the column
The data type describes whether the column contains data that is of a certain type, such as integer, string, or date type.
Typically, a column's optimal data type is obvious because most or all of the column values are of the same data type. However, when the list contains multiple different data types, check the frequency count for the inferred data type. If that frequency count is low relative to the row count of the table, invalid data values might cause the wrong data type to be inferred.
Learn more
Parent topic: Reviewing metadata enrichment results