Considerations for data masking

Data masking helps you protect sensitive data. It is defined in the rules of a policy that is enforced for an asset.

Masked data can be enforced for:

  • Catalog asset preview
  • Project asset preview
  • Download from catalog
  • Data Refinery

If you select data masking, consider the following:

  • You can mask the contents of columns in relational data and CSV, Avro, partitioned data, and Parquet files. Data in other formats is not masked. If you want to deny access to information in notebooks, models, folders, images, or other unstructured files, you must create rules that use system terms, such as asset classification, asset tags, user name, and asset owner.
  • The original data asset at its source location is not affected by data masking. When previewing a data asset that is under a data masking policy in a catalog or project, only the subset of the data asset used in the preview is masked and it can be up to 48 hours old.
    Note: The asset owner always sees the original values because policies do not apply to asset owners.

  • The asset preview in the asset browser is not under policy enforcement. This preview is displayed when you try to add a connected data asset to the catalog. When you navigate using the source database schema or folders, you can click the eye (eye icon) icon to preview the asset. This asset preview is not enforced by data governance.

  • Only data assets from IBM Cloud Object Storage can be directly downloaded to your desktop. If the catalog asset is under a data masking policy, the download process applies the masking rules to the whole data set and creates a new CSV file with masked data that is available for download.

  • When a catalog user adds an asset that is under a data masking from a governed catalog to a project, the same policy applies to the asset for that user and other users in the project so that they all see the same masked values in the asset preview:

    • If a project user then adds this asset to Data Refinery, the data subset used for shaping and refining in the tool remains masked. Data masking is applied to the whole data set when you run a Data Refinery flow. You can then save the Data Refinery flow to create a new data asset with masked data in the specified target.
    • If a project user then adds this asset to other analytic tools in the platform such as notebook, model builder, Dashboards, or flow editor, the original source data asset is used and the data is not masked.
    • The project asset under data masking can’t be downloaded from the project.
  • The column headers of the assets, you want to mask or preview, must consist of alphanumeric characters (a-z, A-Z, 0-9).
    Note: Multi-byte or special characters are not supported.

  • The access methods and data flow of IBM Watson REST APIs associated with Watson Knowledge Catalog do not support data masking.

  • The Profile page does not show profiling details for masked data columns.

Next steps

Learn more