0 / 0
Managing data quality definitions

Managing data quality definitions

You can create and manage data quality definitions to define logic that is useful for analyzing the data quality in your data assets.

If you have the required permissions, you can manage data quality definitions in these ways:

You can also complete these tasks with APIs instead of the user interface. The links to these APIs are listed in the Learn more section.

Required permissions

To view data quality definitions, you must have at least the Viewer role in the project.
To create, edit, or delete data quality definitions, you must have the Admin or the Editor role in the project.

Creating a data quality definition

To create a data quality definition:

  1. Open a project, click New asset, and select Data quality definition.

  2. Define details:

    • Specify a name for the data quality definition.
    • Optional: Provide a description.
    • Optional: Select a data quality dimension to describe the primary data quality metric for the rule logic in this asset. The selected dimension can be used as report category, for filtering, or for visualizing selected data.
  3. Define the rule logic. You can use the free form editor to construct your rule logic. Enter an expression in the Rule expression field. For a list of expressions that you can use, see Building blocks for rule logic. Also check the set of sample rule expressions. These samples demonstrate how you can combine the building blocks for rule logic. You can copy the provided expressions into your own data quality definitions and use them as provided or adjust them as needed.

    Alternatively, you can use block elements to construct your rule logic:

    1. Select an element from the Logic group, for example IF THEN. You can expand the rule logic with AND, OR, and NOT operators.

    2. Select Checks, choose the type of check you want to use, and connect it to the IF block.

    3. Select as many conditions as you need for the check from the Variables and Literals, Operations, Date and Time, General, Mathematical, or String groups, and drag them into the Checks logic.

    4. Select one or more types of checks from the Checks group, and connect them to the THEN block.

    5. Select as many conditions as you need for the check from the Variables and Literals, Operations, Date and Time, General, Mathematical, or String groups, and drag them into the Checks logic.

    6. Additional actions become available when you right-click the canvas or an individual block. For example, you can duplicate the block or add a comment.

      Tip: Always add comments in the block section. Entering or updating comments in the Rule expression text area might not always work as expected.

    You can delete a block element or the entire construct by dragging it to the trash can.

    Review the rule logic in the Rule expression field.

    When you click Create, the syntax of the expression is checked. If it is valid, the data quality definition is created. You can now create data quality rules from this definition.

Publishing a data quality definition

You can make any data quality definition available for re-use in other projects by publishing it to a catalog from where it can be added to any number of projects. Before you do so, make sure that the description of the data quality definition provides meaningful information. Such information helps other users pick the right data quality definition for use in their project.

To publish a data quality definition:

  1. Select the data quality definition from the list of assets and click Publish to catalog. Alternatively, you can select Publish to catalog from the asset's overflow menu.
  2. Select the catalog and fill in the asset properties.
  3. If an asset duplicate already exists in the catalog, you can specify what action should be taken in such a case. The choices you have are determined by the catalog default setting. For more information about duplicate asset handling, see Handling duplicate assets in catalogs.
  4. Click Publish. The assets are added to the catalog and you are the owner of them.

Editing a data quality definition

You can edit a data quality definition to update its name, its description, the selected data quality dimension, the rule expression, or any business term assignments.

To edit a data quality definition, open the asset and then perform the appropriate action:

  • Click the edit icon (edit icon) next to the property that you want to change.
  • Select an option from the overflow menu next to the asset name. For example, you can select Rename to change the asset name.

Remember that any changes to the rule expression affect all rules derived from this data quality definition. To see which rules are related to this data quality definition if any, click the info icon icon.

Deleting a data quality definition

You can delete a data quality definition in one of these ways:

  • In the project, select the data quality definition and click Delete.
  • Open the data quality definition and select Delete from the overflow menu next to the name of the data quality definition.

If any data quality rules are based on this data quality definition, you must delete those rules before you can delete the definition.

Learn more

Parent topic: Managing data quality