Configuring output settings for data quality rules

Last updated: Mar 21, 2025
Configuring output settings for data quality rules

To capture more rule output than statistical information, configure an external output location and the content that you want to be written to that location.

You can choose to write the rule output to a database table. If the bindings in a definition-based rule are managed externally, you also have the option to create up to 4 DataStage output links.

To generate a database table or output links:

  1. Enable the External output option and expand the section.

    If you set up an output table configuration in the project settings, you can select to inherit that configuration and directly go to configuring the output content. An inherited configuration is shown as Current.

  2. If you do not want to use the table that is configured in the project settings, select the type of output you want to generate:

    • Write the output to a new or an existing database table.

      Select a connection. Depending on the selected connection, select a schema, or select a catalog and a schema. For a new table, enter the name of the output table to be created. Otherwise, select an existing table. In that case, the Output content section is populated with the columns of this table and you can map content to these columns.

      You can choose whether the output table is to be added to your project as a data asset when you run the rule.

      For supported database types, see Supported connectors for curation and data quality.

      When you define a new table, the table name can be a user-defined name, a parameter for dynamically creating a name, a combination of user-defined name and parameter, or a combination of parameters.

      User-defined table names must follow this convention:

      • The first character for the name must be an alphabetic character.
      • The rest of the name can consist of alphabetic characters, numeric characters, or underscores.
      • The name must not contain spaces.

      For dynamic name creation, you can use these parameters:

      • #rule_id#
      • #rule_name#
      • #project_id#

      Additionally, you can select these options:

      • Create table only when issues are found This option avoids that empty tables are created in cases where a rule doesn't produce output records. However, if a table with that name already exists because it was generated for an earlier rule run, the table remains unchanged.
      • Import generated output table as project asset To enable easy access to the rule output, add new rule output tables as data assets to the project. Instead of running a database query, you can view the data by opening the data asset from the Assets page in your project or from the rule's run history. This option is enabled by default.

      Also, configure the following settings:

      • Output records: Select whether you want to include all records in your output, only records that do not meet rule conditions (default setting), or only those records that meet the rule conditions.
      • Maximum number of exception output records: You can include all records or set a maximum number.
      • Update method: New output records can be appended to the existing content of the output table. If you want to keep only the output results from latest run, select to overwrite existing records. For the update method Append, the table schema can't be changed, that is, you can't rename, add, or delete columns. If you want to change the output content for a data quality rule and write to an existing output table, make sure to use the update method Overwrite to replace the columns in output table with the newly defined output columns.

    • To create output links if the bindings are managed in a DataStage flow, select DataStage output links. Configure up to 4 output links. Select which output should be routed to a specific link: all records, only records that do not meet rule conditions, only records that meet the rule conditions, or all violated rule conditions. Also, define the maximum number of output records to be written per link. The content of the output records is determined by what you configure in the next step. For violated rule conditions, 0 or more output records can be returned depending on the number of data quality definitions in the rule. Each output record has the following information:

      • The record ID. This metric is automatically set as an output column.
      • The definition ID of one of the definitions that the input record didn't pass
      • The number that uniquely identifies the failed definition in case of duplicated definitions

      To map the definition ID to a data quality definition in your project, use the IBM Knowledge Catalog API:

      The target nodes of these output links must be configured in the DataStage flow.

    You can change the output type at any time. Depending on your new selection, any configured settings are reset or overwritten. When you're done, collapse the section and proceed to configure the output content.

  3. Configure the content of your output table.

    1. If bindings are managed externally, you can include any additional columns that are provided through the DataStage input link in the output table. Such columns are not listed in the output table configuration. You can't include any variables that are used in rule bindings.
    2. Click Add output content and select one of these options:
      • Columns

        Select the columns that you want to see in your output table. For SQL-based rules, you can select from all columns that the SQL query returns. The Columns option is not available if you create a definition-based rule with externally managed bindings.

      • Statistics and attributes

        Select any additional attributes or statistics that you want to include in your output table. Which metrics are available depends on the type of data quality rule.

        • Bound to asset ID

          Lists the ID of the data asset to which the rule is bound. If this metric is selected, one output record is written for each data asset in a data quality rule. Thus, more than one output record might be written for an individual input record. The content of the output records for different assets of the same data quality rule varies only for these metrics for each individual input record: Data quality definition, Data quality definition ID, Bound to asset ID, and maybe Passing rules, Failing rules, Percent passing rules, and Percent failing rules

          You can use this metric only in combination with the Data quality definition or Data quality definition ID metric. If the rule is associated with more than one data quality definition, metric Disambiguator is automatically included in the output.

          This metric is not available for definition-based rules with externally managed bindings or SQL-based rules.

        • Bound to column

          Lists the name of each bound column. If this metric is selected, one output record is written for each column in a data quality definition. Thus, more than one output record might be written for an individual input record. The content of the output records for different columns of the same data quality definition varies only for these metrics for each individual input record: Data quality definition, Data quality definition ID, Bound to column, and maybe Passing rules, Failing rules, Percent passing rules, and Percent failing rules

          You can use this metric only in combination with the Data quality definition or Data quality definition ID metric. If the rule is associated with more than one data quality definition, metric Disambiguator is automatically included in the output.

          This metric is not available for definition-based rules with externally managed bindings or SQL-based rules.

        • Data quality definition

          Lists the name of the applied data quality definition. If this metric is selected, multiple output records might be written depending on the number of data quality definitions that the rule contains.

          This metric is not available for SQL-based rules.

        • Data quality definition ID

          Contains a unique key that identifies the applied data quality definition. If this metric is selected, multiple output records might be written depending on the number of data quality definitions that the rule contains.

          This metric is not available for SQL-based rules.

        • Disambiguator

          Contains a number to disambiguate the data quality definitions that are used in the rule primarily if a data quality definition is used multiple times. Numbering starts at 0.

          This metric is not available for SQL-based rules.

        • Failing rules

          Shows the number of rule conditions that the record didn't meet.

        • Job ID

          Contains a unique key that identifies the job that is associated with the DataStage flow for the rule.

        • Job run ID

          Contains a unique key that identifies an individual run of the job that is associated with the DataStage flow for the rule.

        • Passing rules

          Shows the number of rule conditions that the record met.

        • Percent passing rules

          Shows the percentage of rule conditions that were met.

        • Percent failing rules

          Shows the percentage of rule conditions that weren't met.

        • Project ID

          Contains a unique key that identifies the project in which the rule lives.

        • Record ID

          Contains a unique key that identifies a record in the output. This metric is automatically included for output links for violated rule conditions.

        • Rule ID

          Contains a unique key that identifies the data quality rule.

        • Rule name

          Contains the name of the data quality rule.

        • System date

          Shows the system date when the rule was run. System date is the date in the time zone set on the server.

        • System time

          Shows the system date and time when the rule was run. System date and time is the date and time in the time zone set on the server.

      • Variables

        Select the variables from the rule logic that you want to include in your output table.

      • Expressions

        Add an expression that defines the content of an output column. You can give this column a descriptive name in the output content overview. You can use block elements to construct your expression. Select and combine elements as required. For more information about using block elements, see Managing data quality definitions. Alternatively, you can use the free form editor to construct your the expression. See Building blocks for rule logic or rule output.

Learn more

Parent topic: Managing data quality rules