0 / 0
Managing data quality definitions
Last updated: Dec 13, 2024
Managing data quality definitions

You can create and manage data quality definitions to define logic that is useful for analyzing the data quality in your data assets.

If you have the required permissions, you can manage data quality definitions in these ways:

You can also complete these tasks with APIs instead of the user interface. The links to these APIs are listed in the Learn more section.

Required permissions

To view data quality definitions, you must have at least the Viewer role in the project.

To create, edit, or delete data quality definitions, you must have the Manage data quality assets user permission and the Admin or the Editor role in the project.

Creating a data quality definition

To create a data quality definition:

  1. Open a project, click New asset > Define how to measure data quality.

  2. Define details:

    • Specify a name for the data quality definition.
    • Optional: Provide a description.
    • Optional: Select a data quality dimension to describe the primary data quality metric for the rule logic in this asset. The selected dimension can be used as report category, for filtering, or for visualizing selected data.
  3. Define the rule logic. You can use the free form editor to construct your rule logic. Enter an expression in the Rule expression field. For a list of expressions that you can use, see Building blocks for rule logic. Also check the set of sample rule expressions. These samples demonstrate how you can combine the building blocks for rule logic. You can copy the provided expressions into your own data quality definitions and use them as provided or adjust them as needed.

    Special considerations apply when your expression contains strings that are enclosed in double quotation marks, for example: ucase(trim(var_first_name)) NOT contains "YOU'RE"

    Such values are treated as string literals. However, if you want such values to be treated as variables, you can change the project setting allow_quoted_variables to true by using the IBM Knowledge Catalog API Replace project settings for data quality rules.

    As an alternative to writing your expressions in the free form editor, you can use block elements to construct your rule logic:

    1. Select an element from the Logic group, for example IF THEN. You can expand the rule logic with AND, OR, and NOT operators.

    2. Select Checks, choose the type of check you want to use, and connect it to the IF block.

    3. Select as many conditions as you need for the check from the Variables and Literals, Operations, Date and Time, General, Mathematical, or String groups, and drag them into the Checks logic.

    4. Select one or more types of checks from the Checks group, and connect them to the THEN block.

    5. Select as many conditions as you need for the check from the Variables and Literals, Operations, Date and Time, General, Mathematical, or String groups, and drag them into the Checks logic.

    6. Additional actions become available when you right-click the canvas or an individual block. For example, you can duplicate the block or add a comment.

    Tip:

    Always add comments in the block section. Entering or updating comments in the Rule expression text area might not always work as expected.

    You can delete a block element or the entire construct by dragging it to the trash can.

    Review the rule logic in the Rule expression field.

    When you click Create, the syntax of the expression is checked. If it is valid, the data quality definition is created. You can now create data quality rules from this definition.

Publishing a data quality definition

You can make any data quality definition available for re-use in other projects by publishing it to a catalog from where it can be added to any number of projects. Before you do so, make sure that the description of the data quality definition provides meaningful information. Such information helps other users pick the right data quality definition for use in their project.

To publish a data quality definition:

  1. Select the data quality definition from the list of assets and click Publish to catalog. Alternatively, you can select Publish to catalog from the asset's overflow menu.

  2. Select the catalog and fill in the asset properties.

  3. If an asset duplicate already exists in the catalog, you can specify what action should be taken in such a case. The choices you have are determined by the catalog default setting. For more information about duplicate asset handling, see Handling duplicate assets in catalogs.

  4. Click Publish. The assets are added to the catalog and you are the owner of them. Assigned business terms and tags are published with the asset. Assigned governance rules are not published. You have to re-create such relationships manually after publishing the definition.

    The rule expression and the selected data quality dimension are also published and available in the asset preview in the catalog.

If a data quality definition has a term assigned, the Data quality definitions section on the term's Related content page has one entry for each container in which the definition with that relationship lives. The same applies to relationships with governance rules.

When you add a data quality definition from a catalog to a project, assigned classifications and any relationships that might be defined are not copied to the project.

Editing a data quality definition

You can edit a data quality definition to update its name, its description, the selected data quality dimension, the rule expression, or any business term or governance rule assignments.

To edit a data quality definition, open the asset and then perform the appropriate action:

  • Click the Edit icon edit icon next to the property that you want to change.
  • Select an option from the overflow menu next to the asset name. For example, you can select Rename to change the asset name.

Remember that any changes to the rule expression affect all rules derived from this data quality definition. To see which rules are related to this data quality definition if any, click the Info icon info icon.

Deleting a data quality definition

You can delete a data quality definition in one of these ways:

  • In the project, select the data quality definition and click Delete.
  • Open the data quality definition and select Delete from the overflow menu next to the name of the data quality definition.

If any data quality rules are based on this data quality definition, you must delete those rules before you can delete the definition.

Learn more

Parent topic: Managing data quality

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more