Governing virtual data with data protection rules in Data Virtualization

Last updated: Mar 17, 2025
Data protection rules in Data Virtualization

You can govern your virtual data by defining data protection rules.

Before you begin

These instructions assume that you completed the following prerequisites:

About this task

Data protection rules specify what data to control by allowing or denying access, masking data, or filtering rows in virtualized data.

When you publish virtualized data assets to a catalog, they become subject to the defined data protection rules.

When IBM Knowledge Catalog is installed on the same Cloud Pak for Data instance as Data Virtualization, the enforcement of IBM Knowledge Catalog data protection rules is always enabled.

You can use following types of data protection rules in Data Virtualization:

Note: Db2 evaluates and enforces authorizations (Allow or Deny) separately from RCAC (column masks and row filters), while IBM Knowledge Catalog evaluates all applicable data protection rules (including both authorizations and RCAC) to yield a single decision. For example, if an IBM Knowledge Catalog column masking or row filtering rule applies in addition to a Deny authorization rule, then IBM Knowledge Catalog denies authorization under the "Most secure action wins" rule action precedence. This means that the Data Virtualization authorization request will be denied, and RCAC won't yield or apply any column masks or row filters.
Data masking
Data masking is used to hide sensitive data but still allow users to use the asset.
For more information, see Masking virtual data in Data Virtualization.
Row-level filtering

You can create data protection rules to include or exclude rows in your virtualized data to limit the rows that users can see. For example, you can define a rule so that employees can see customer data that is associated only with their department.

For more information, see Row-level filtering in Data Virtualization.

Watch the following video for an overview of access control in governance and data protection in Data Virtualization.

This video provides a visual method to learn the concepts and tasks in this documentation.

IBM Knowledge Catalog DPRs and Data Virtualization GRANTs

Data Virtualization determines whether you have access to an object through Db2 authorization checks (or GRANTs) and IBM Knowledge Catalog data protection rules (DPRs). IBM Knowledge Catalog DPRs restrict access to governed objects that are published to a governed catalog.

The following diagram illustrates this process: If the enforcement of IBM Knowledge Catalog DPRs is enabled in Data Virtualization, they are evaluated against the IBM Knowledge Catalog catalog assets to determine your authorization to access the objects. If you are granted authority to the objects, then Data Virtualization conducts Db2 authorization checks to confirm your access. You can only access the objects if you are authorized in both cases.

Diagram illustrating how IBM Knowledge Catalog DPRs further restricts access to governed objects before evaluating the object against IBM Knowledge Catalog and checking for Db2 authorization. If IBM Knowledge Catalog DPRs are not enabled, then Db2 authorization is checked next instead of evaluating the object against the IBM Knowledge Catalog.

For more information on managing user access, see Managing access to virtual objects in Data Virtualization.

For more information on Db2 authorization checks, see Authorization model for views.

Data Virtualization data source definitions (DSD)
A data source definition (DSD) is a unique stable identifier for the connections across all the catalogs and projects that connect to your particular Data Virtualization instance.
Data Virtualization enforces IBM Knowledge Catalog DPRs on every object (at the Data Virtualization level) that is associated with the DSD for its Data Virtualization instance, ensuring that DPRs are enforced consistently regardless of how your objects are accessed across Cloud Pak for Data.
A DSD is automatically created for your Data Virtualization instance in Catalogs > Platform assets catalog when you upgrade or provision a Data Virtualization instance.
For more information, see Data protection with data source definitions.
Note: If you create a connection to your Data Virtualization instance on Cloud Pak for Data with a hostname and port that is not a part of your DSD, then add that hostname and port as endpoints to your DSD.

For more information, see Adding endpoints to a new or existing data source definition.

The following diagram illustrates the relationships between the key entities that are involved in governing data in Data Virtualization:
  • Data Virtualization Instance identified by its Instance ID.
  • Data source definition (DSD): DSDs are generated automatically when you create your Data Virtualization instance. DSDs store the endpoints (hosts, ports, and instance ID) of your Data Virtualization instance, and the protection method that should apply to the data assets from this Data Virtualization instance (such as row-level filtering and data masking).
  • Connection to Data Virtualization: Connections from catalogs and projects to your Data Virtualization instance.
  • Catalog/Project data asset: Metadata representing objects (such as tables & views) in your Data Virtualization instance.Diagram illustrating the relationships between the key entites involved in governing data

Procedure

To govern your virtual data with data protection rules:

  1. Virtualize your data and publish it to a governed catalog.
    If you have the Data Virtualization Manager or Engineer roles, when you virtualize data by using the Data Virtualization console, your virtual data is published to a governed catalog automatically if you enforced publishing to a catalog. Otherwise, you can choose a catalog to publish your virtual data to. See Publishing virtual data to the catalog in Data Virtualization for details.

    This data is automatically profiled if it is configured in the catalog settings. The profile of a catalog data asset includes data classes and statistics based on the sampling of the data. Profiling automatically assigns data classes to table columns. See Profiles.

  2. Govern your virtual data in the catalog:
    • Assign data classes, business terms, and tags that are authored in IBM Knowledge Catalog to your virtual tables and columns. For more information about how to assign and manage business terms in IBM Knowledge Catalog, see Editing asset properties in a catalog (IBM Knowledge Catalog) and Managing metadata enrichment.
    • Use data protection rules to allow or deny access to a virtual table. Data Virtualization users can see the virtual table but cannot preview the contents of the table or perform any actions on the table or its columns when a deny rule applies to the data asset.

      A lock icon (Lock icon) to the table name on the Virtualized data page indicates that access to the data in the table is denied by a data protection rule. A lock icon (Lock icon) might also appear when an asset in the catalog has not been profiled and is pending data class assignment.

      To access views, these conditions must be met.

      • You have the required permissions to access views.
      • The creator of the view has the required permissions to access the objects referenced by the view.
    • To see an asset preview in a catalog, these conditions must be met.
      • You are not blocked by any data protection rules. If you are the owner of the asset, you can’t be blocked by data protection rules.
      • If the asset has an associated connection, these conditions must also be true:
        • You are not blocked from accessing the connection by any data protection rules.
        • The username in the connection details has access to the object at the data source.

      For more information, see Asset previews.

    • Use a data protection rule to mask data in columns or filter rows of a virtual table. Use data masking rules to disguise the original data. Depending on the method of data masking, data is redacted, substituted, or obfuscated. For more information, see Masking virtual data in Data Virtualization.

      A lock icon (Lock icon) next to the column name indicates that the data in the column is masked by a data protection rule.

    Note:

    By default, the user who created the asset in the catalog is the asset owner. Catalog asset owners are exempt from data protection rules, but are subject to the Data Virtualization access control. If the user who is accessing a virtual object is also the owner of the corresponding asset in IBM Knowledge Catalog, the data protection rules and policies that are defined for that user in IBM Knowledge Catalog are not enforced in Data Virtualization. See Masking data with policies for details.

    When a virtual object is added to a catalog, and you have at least one masking or row filtering data protection rule or a data protection rule that is based on a data class, access to it will be denied until its profiling and assignment of data classes completes.

    The username that is specified in the Data Virtualization connection must also be authorized to access the object in Data Virtualization, unless a data masking rule applies to the asset and the previewing user.