0 / 0
Data protection rules in Watson Query

Governing virtual data with data protection rules in Watson Query

You can govern your virtual data by defining data protection rules.

Before you begin

These instructions assume that you completed the following prerequisites.

About this task

Data protection rules specify what data to control by, for example, denying access or masking data. You can add data protection rules to policies and enforce these policies in Watson Query.

You can use following types of data protection rules:

Deny of access
Deny of access prevents users from accessing all the data of a Watson Query asset. For example, if the Data steward doesn’t want to expose the entire asset to one user, they can define this rule with a condition that matches the username.
Data masking
Data masking is used to hide sensitive data but still allow users to use the asset. There are three types of data masking rules: redact, substitute, and obfuscate. The user can decide to enable one of these rules based on how to use the data in the upstream application.
  • Redaction replaces all or a subset of characters in a data cell.
  • Substitute replaces data with the salted hashes of the original values. This method is the most likely to maintain referential integrity.
  • Obfuscate replaces data with formatted values that are similar to the original data.
Row level filtering

You can create data protection rules to include or exclude rows in your virtualized data to limit the rows that users can see. For example, you can define a rule so that employees can see customer data that is associated only with their department.

You can apply filter criteria to include or exclude rows. For more information, see Filtering rows in data protection rules.

Access to the tables that are referenced in the row-level filter expressions is not evaluated, including data masking.

Row filtering rules that apply to Watson Query assets and reference other assets must reference Watson Query assets only. If you query an object and the row filtering rules reference assets that are not Watson Query assets, the query fails with the following error:

The statement failed because a Big SQL component encountered an  error. Component
      receiving the error: "SCHEDULER". Component returning  the error: "SCHEDULER". Log entry
      identifier:  "[SCL-0-<log_entry_id>]".. SQLCODE=-5105, SQLSTATE=58040

You can confirm the cause of the error by running the following query:

select line from table(syshadoop.log_entry('SCL-0-<log_entry_id_from_error>'))
Important:

Watson Query access control is not applied when data masking or row-level filtering applies to the preview in Watson™ services (other than Watson Query). The Watson Query internal access controls, which are controlled by using Manage access in the Watson Query UI, do not apply to the preview from the other Watson services with masking or row-level filtering. You must define your rules to manage access to the catalogs, projects, data assets, or connections for access control in the other Watson services.

When you publish virtualized data assets to a catalog, they are treated like any other data asset and are subject to data protection rules. Data protection rules can deny or mask access to assets based on criteria that can include governance artifacts, such as business terms and data classes.

Watch the following video for an overview of access control in governance and data protection in Watson Query.

This video provides a visual method as an alternative to following the written steps in this documentation.

Procedure

To govern your virtual data with data protection rules:

  1. Virtualize your data and publish it to a governed catalog.
    See Publishing virtual data to the catalog in Watson Query for details.

    This data is automatically profiled if it is configured in the catalog settings. The profile of a catalog data asset includes generated metadata and statistics about the textual content of the data. Profiling automatically assigns data classes to table columns. See Profiles.

  2. Govern your virtual data in the catalog:
    • Assign data classes, business terms, and tags that are authored in IBM Knowledge Catalog to your virtual tables and columns. See Managing business terms for details on how to manage and author business terms in IBM Knowledge Catalog.
    • Use data protection rules to allow or deny access to a virtual table. Watson Query users can see the virtual table but cannot preview the contents of the table or perform any actions on the table or its columns when a deny rule applies to the data asset. See Managing data protection rules for details on how to create data protection rules in IBM Knowledge Catalog.

      A lock icon (Lock icon) to the table name on the Virtualized data page indicates that access to the data in the table is denied by a data protection rule. A lock icon (Lock icon) might also appear when an asset in the catalog has not been profiled and is pending data class assignment.

      To access views, these conditions must be met.

      • You have the required permissions to access views.
      • The creator of the view has the required permissions to access the objects referenced by the view.
    • To see an asset preview in a catalog, these conditions must be met.
      • You are not blocked by any data protection rules. If you are the owner of the asset, you can’t be blocked by data protection rules.
      • If the asset has an associated connection, these conditions must also be true:
        • You are not blocked from accessing the connection by any data protection rules.
        • The username in the connection details has access to the object at the data source.

      For more information, see Asset previews.

    • Use a data protection rule to mask data in columns or filter rows of a virtual table. Use data masking rules to disguise the original data. Depending on the method of data masking, data is redacted, substituted, or obfuscated. See Masking data with data protection rules for details.

      A lock icon (Lock icon) next to the column name indicates that the data in the column is masked by a data protection rule.

      Data masking has certain limitations in Watson Query. See Masking virtual data.

    Note:

    By default, the user who created the asset in the catalog is the asset owner. Catalog asset owners are exempt from data protection rules, but are subject to the Watson Query access control. If the user who is accessing a virtual object is also the owner of the corresponding asset in IBM Knowledge Catalog, the data protection rules and policies that are defined for that user in IBM Knowledge Catalog are not enforced in Watson Query.

    When a virtual object is added to a catalog, and you have at least one masking or row filtering data protection rule or a data protection rule that is based on a data class, access to it will be denied until its profiling and assignment of data classes completes.

    The username that is specified in the Watson Query connection must also be authorized to access the object in Watson Query, unless a data masking rule applies to the asset and the previewing user.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more