0 / 0
Managing data quality rules
Last updated: Dec 13, 2024
Managing data quality rules

You can create and manage data quality rules for assessing the quality of the data in your project.

If you have the required permissions, you can manage data quality rules in these ways:

You can also complete these tasks with APIs instead of the user interface. The links to these APIs are listed in the Learn more section.

Required services
IBM Knowledge Catalog
DataStage or DataStage as a Service Anywhere
With DataStage, you can run data quality rules in the supported regions. With DataStage as a Service Anywhere, you can run data quality rules outside of IBM Cloud by using remote engines. For more information about setting up remote engines, see the DataStage as a Service Anywhere documentation.

Required permissions

To view data quality rules, you must have at least the Viewer role in the project.

To create, edit, or delete data quality rules, you must have the Manage data quality assets user permission and the Admin or the Editor role in the project.

Project settings for rule execution

Project-level settings determine certain aspects of data quality rule execution, for example, whether trailing spaces in string values are ignored in equality checks. These settings apply to all data quality rules for a given project. You can check or update these settings for each project by using the IBM Knowledge Catalog API Get project settings for data quality rules and Replace project settings for data quality rules.

Create data quality rules

You can create different types of data quality rules:

Editing data quality rules

You can edit a data quality rule to update its description, the selected data quality dimension, any business term assignments, or the rule configuration. You can also manage the list of related items.

To edit a data quality rule, open the asset and perform the appropriate actions:

  • To update the description or the data quality dimensions, click the Edit icon edit icon next to the property.

  • To manage business terms, go to the Governance artifacts section of the asset and add or remove terms as needed.

  • To assign or delete governance rules, go to the Governance artifacts section of the asset, and add or remove governance rules as needed.

  • To update the rule configuration, click Edit rule. When you edit the rule configuration, you can also change the way the rule is built: from using data quality definitions to using SQL statements and vice versa. However, when you do that, all existing rule configuration is discarded and you basically must start from scratch.

    You can also change the output type. Depending on your new selection, any configured output settings are reset or overwritten. Rule output that was written before the change remains untouched.

For data quality rules that bind data directly, a Validates data quality of relationship with each bound column and with the asset that contains the column is added to the Related items section. You can manually add assets and columns with this type of relationship to all types of data quality rules. When you add assets and columns to data quality rules with externally managed bindings or SQL-based data quality rules with this relationship, these types of rules contribute to the data quality scores of the corresponding asset or column. The score and issues that are produced by the rule are reported for all assets and columns that are linked with the Validates data quality of relationship type.

When you view a data quality rule, you can click the Info icon info icon to view more details such as output settings or related assets.

Deleting data quality rules

You can delete a data quality rule in one of these ways:

  • In the project, select the data quality rule and click Delete.
  • Open the data quality rule and select Delete from the overflow menu next to the name of the data quality rule.

When you delete a data quality rule, its run history, any associated DataStage flow and jobs are also deleted from the project. Output tables in the project and in the database are kept. The issues that were returned by this data quality rule are removed, and the data quality and dimension scores are recalculated.

Learn more

Next steps

Parent topic: Managing data quality