0 / 0
Data quality assets

Data quality assets

Design data quality assets for analyzing and monitoring the data quality in a project.

You can have the following data quality assets in your project:

Before you start creating data quality definitions and rules, give the following topics some thought:

  • What do you want to analyze and monitor?
  • Which elements do you need to evaluate?
  • What's the goal of the analysis, such as checking for completeness, validity, and so on?

Data quality definitions

A data quality definition represents a generic form of a data quality rule. It describes the rule evaluation or condition by using logical variables that are not tied to any actual data. Thus, it can be used in any number of data quality rules. If you change the data quality definition, you also change the validation logic for all rules derived from the definition.

You create and manage data quality definitions in projects. To make a data quality definition available for re-use in other projects, you can publish it to a catalog.

Data quality rules

A data quality rule links or binds logical variables to actual data for evaluation. A rule is run against physical data to assess the quality of your data by evaluating and validating specific conditions. Each rule run provides statistics and information about potential exceptions as defined for the rule's output table.

You create, manage, and run data quality rules in projects.

You can create rules from one or more data quality definitions or you can create data quality rules with SQL statements. Rules built from data quality definitions capture which columns comply with the rule conditions and which don't. SQL-based rules are better suited to check for noncompliant records.

For example, you want to validate tax identifiers. So your concepts could be TaxID exists and Validate TaxID.

Now, you have these options:

  • Create rules from data quality definitions. For either concept, you can create a data quality definition with evaluation logic for the logical variable tax_id. The first condition is that the tax identifier (or TaxID) must exist, and the second condition is that the tax identifier must meet a defined format.

    Data quality definition TaxID exists: tax_id exists
    Data quality definition Validate TaxID: tax_id matches_format 'AA99-A999-9999'

    Then, select one of these options:

    • For each column that contains a tax identifier to be validated, define two data quality rules. The first rule binds the logical variable tax_id of the definition TaxID exists to the column. The second rule binds the logical variable tax_id of the definition Validate TaxID to the column.
    • For each column that contains a tax identifier to be validated, define one data quality rule and use both data quality definitions in that rule. Bind the logical variable tax_id in either definition TaxID exists and Validate TaxID to the column.
    • Define one data quality rule and use both data quality definitions in that rule. Bind the logical variable tax_id in either definition TaxID exists and Validate TaxID to a parameter set of the type Parameter from column. Add all columns that contain a tax identifier to be validated to that parameter set.
  • Create an SQL-based rule: select tax_id from taxschema.taxtable where tax_id is null or not regexp_like(tax_id, '^[a-zA-Z]{2}[0-9]{2}-[a-zA-Z][0-9]{3}-[0-9]{4}$')

Learn more

Parent topic: Managing data quality

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more