Considerations for data anonymization
Anonymized data can be enforced for:
- Catalog asset preview
- Project asset preview
- Download from catalog
- Data Refinery
If you select data anonymization, consider the following:
- You can anonymize the contents of columns in relational data and CSV, Avro, partitioned data, and Parquet files. Data in other formats is not anonymized. If you want to deny access to information in notebooks, models, folders, images, or other unstructured files, you must create rules that use system terms, such as asset classification, asset tags, user name, and asset owner.
The original data asset at its source location is not affected by data anonymization. When previewing a data asset that is under a data anonymization policy in a catalog or project, only the subset of the data asset used in the preview is anonymized and it can be up to 48 hours old.
Note: The asset owner always sees the original values because policies do not apply to asset owners.
The asset preview in the asset browser is not under policy enforcement. This preview is displayed when you try to add a connected data asset to the catalog. When you navigate using the source database schema or folders, you can click on the eye () icon to preview the asset. This asset preview is not enforced by data governance.
Only data assets from IBM Cloud Object Storage can be directly downloaded to your desktop. If the catalog asset is under a data anonymization policy, the download process applies the anonymization rules to the whole data set and creates a new CSV file with anonymized data that is available for download.
- When a catalog user adds an asset that is under a data anonymization policy from the catalog with policies enforced to a project, the same policy applies to the asset for that user and other users in the project so that they all see the same anonymized values in the asset preview:
- If a project user then adds this asset to Data Refinery, the data subset used for shaping and refining in the tool remains anonymized. Data anonymization is applied to the whole data set when you run a Data Refinery flow. You can then save the Data Refinery flow to create a new data asset with anonymized data in the specified target.
- If a project user then adds this asset to other analytic tools in the platform such as notebook, model builder, Dashboards, or flow editor, the original source data asset is used and the data is not anonymized.
- The project asset under data anonymization policy can’t be downloaded from the project.
The column headers of the assets, you want to anonymize or preview, must consist of alphanumeric characters (a-z, A-Z, 0-9).
Note: Multi-byte or special characters are not supported.
The access methods and data flow of IBM Watson REST APIs associated with Watson Knowledge Catalog do not support data anonymization.
- The Profile page does not show profiling details for anonymized data columns.