0 / 0
Data quality score

Data quality score

A data quality score is displayed for the entire data asset and for all columns that the analyzed data asset contains.

Data quality scores are computed based on quality dimensions for each individual column in the data asset, and then a combined quality score for the entire data asset is calculated. The combined score is an average of the scores for all columns.

To prevent records with multiple quality issues to unnecessarily weigh down the data quality score, values that are identified with more than one issue do not weigh differently against the quality score as values with only one.

Data quality confidence

Each potential quality dimension identified at either a value level or a column level is also associated with a confidence number, which indicates the system's certainty that the dimension identified is correct. Confidence is a number between 0.0 and 1.0, with 0.0 being no confidence that the dimension is correct, and 1.0 being absolute confidence that the dimension is correct.

The quality score of a value is computed as the product of (1.0-confidence) of all quality problems identified for that cell or column. For example, imagine you have a column containing US names. One row contains a name from another country that is unusually long and contains a combination of letters not expected in a column of this data class. That record might be identified as a suspect value with a confidence of 70%. In the same column is another value of "###############1234###############." That format is clearly a suspect domain violation and is identified with a confidence of 100%. The score decrease caused by the first value would be 70% of the score decrease caused by the second value.

Learn more

Parent topic: Metadata enrichment results

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more