Advanced masking options
Advanced masking options extends the capability of data protection rules and data location rules by protecting the data with advanced de-identification techniques. The techniques maintain the format and integrity of the data. The high data utility affords data users, such as data scientists, business analysts, and application developers to produce high-quality insights from protected data.
Advanced masking options includes the following features:
Format-preserving de-identification for 165 pre-defined data classes to maintain utility for AI projects.Tip: Data protection rules that are defined with advanced masking options are enforced for Watson Query (Data virtualization). Rules can implement format preserving obfuscation on any of the predefined data classes, except
Reversible encryption that is available for creating copies of data by creating masking flows and one-way hash tokenization for flexible compliance.Restriction: The advanced masking option for Reversibility is not supported when IBM Security Guardium® is integrated with Cloud Pak for Data as a Service.
Relationship integrity to protect data consistently across related data sources.
The following scenarios explain how advanced masking option extends the capability of data protection rules.
Data scientists want to use financial data, such as credit card numbers and banking account numbers in their Machine Learning model to predict fraudulent transactions. The credit card numbers cannot be XXXXXXXXX to produce the results that they’re looking for. Instead, they need actual credit card numbers. The preserve format method in advanced masking opions produce credit card numbers that meet format requirements. Format requirements include maintaining issue identifier information (specifying which credit card company (Visa, Mastercard, and so on) issued the card), luhn checksum algorithm, and so on. The realistic masking ensures that data users can produce accurate results.
Healthcare data users want to use patient data that contains the patients' name and address information to analyze results from terminal disease clinical studies. The patient's name cannot be masked by "XXXX" to produce the results that they’re looking for. Instead, they need realistic names and realistic street names, cities, and countries. As a result, when data users are performing the analyses, they have a broader context that "Jane Doe" who lives on "123 Maple Lane" is the study participant with breast cancer.
Because of the specificity of advanced masking options, these options can be applied to only one data class at a time. These options are optimized for all 165 pre-defined IBM Knowledge Catalog data classes and recommended as the best format-preserving options for each data class. However, they cannot be applied to custom-defined IBM Knowledge Catalog data classes.
The Advanced masking option can be enabled for only the Redact and Obfuscate masking methods. Advanced masking options apply to rules by using
mask data in columns containing data class. Business terms, column names, and tags
are not yet supported.
Creating data protection rules with advanced masking options
Advanced masking options are only enabled for data classes.
Complete the conditions and select the attributes that you want to process. Recommended practice is to create rules in one of the following ways:
If the data class contains any
__insert data class__, then mask data in columns containing data class
__insert data class__.
You can optionally add conditions for asset owners, business terms, tags, and so on, but be careful to understand how these governance artifacts work. They might unintentionally leak unmasked data. See Managing data protection rules.
Masking input data that is small, such as boolean values or single-digit numbers, might look like the data wasn't masked when you run a masking flow job, preview or download the data. However, the data is masked, and the masked value is the same as the unmasked value.
Select the following method to mask data:
- Redact columns
- Obfuscate columns
Substitute is not supported for advanced masking.
Create a rule. See Mask data for more information on how to mask data in assets.
Using the masking previews
The Before preview in the Example data section display how the data is masked when you're viewing data assets in catalogs, projects, and dynamically before running masking flow jobs. The After preview in the Example data section display how the data is masked in the masked copies that are produced by running masking flow jobs.
Watch this video to see how to set advanced masking options and create a masking flow asset in a project.
This video provides a visual method to learn the concepts and tasks in this documentation.
- Mask data
- Masking data with Masking flow
- Creating jobs with Masking flow
- Managing data protection rules
Parent topic: Data protection rules