0 / 0
Advanced masking options

Advanced masking options

Advanced masking options extends the capability of data protection rules and data location rules by protecting the data with advanced de-identification techniques. The techniques maintain the format and integrity of the data. The high data utility affords data users, such as data scientists, business analysts, and application developers to produce high-quality insights from protected data.

Advanced masking options includes the following features:

  • Format-preserving de-identification for 165 pre-defined data classes to maintain utility for AI projects.

    Tip: Data protection rules that are defined with advanced masking options are enforced for Watson Query (Data virtualization). Rules can implement format preserving obfuscation on any of the predefined data classes, except IBAN and URL.

  • Reversible encryption that is available for creating copies of data by creating masking flows and one-way hash tokenization for flexible compliance.

    Restriction: The advanced masking option for Reversibility is not supported when IBM Security Guardium® is integrated with Cloud Pak for Data as a Service.
  • Relationship integrity to protect data consistently across related data sources.

The following scenarios explain how advanced masking option extends the capability of data protection rules.

Data scientists want to use financial data, such as credit card numbers and banking account numbers in their Machine Learning model to predict fraudulent transactions. The credit card numbers cannot be XXXXXXXXX to produce the results that they’re looking for. Instead, they need actual credit card numbers. The preserve format method in advanced masking opions produce credit card numbers that meet format requirements. Format requirements include maintaining issue identifier information (specifying which credit card company (Visa, Mastercard, and so on) issued the card), luhn checksum algorithm, and so on. The realistic masking ensures that data users can produce accurate results.

Healthcare data users want to use patient data that contains the patients' name and address information to analyze results from terminal disease clinical studies. The patient's name cannot be masked by "XXXX" to produce the results that they’re looking for. Instead, they need realistic names and realistic street names, cities, and countries. As a result, when data users are performing the analyses, they have a broader context that "Jane Doe" who lives on "123 Maple Lane" is the study participant with breast cancer.

Important:

Because of the specificity of advanced masking options, these options can be applied to only one data class at a time. These options are optimized for all 165 pre-defined IBM Knowledge Catalog data classes and recommended as the best format-preserving options for each data class. However, they cannot be applied to custom-defined IBM Knowledge Catalog data classes.

The Advanced masking option can be enabled for only the Redact and Obfuscate masking methods. Advanced masking options apply to rules by using mask data in columns containing data class. Business terms, column names, and tags are not yet supported.

Note: When you are creating tables, don't use special characters for source and target table column names in Hive. Special characters that are used in column names aren't supported by inner joins.

Creating data protection rules with advanced masking options

Advanced masking options are only enabled for data classes.

  1. Complete the conditions and select the attributes that you want to process. Recommended practice is to create rules in one of the following ways:

    • If the data class contains any __insert data class__, then mask data in columns containing data class __insert data class__.

    • You can optionally add conditions for asset owners, business terms, tags, and so on, but be careful to understand how these governance artifacts work. They might unintentionally leak unmasked data. See Managing data protection rules.

    • Masking input data that is small, such as boolean values or single-digit numbers, might look like the data wasn't masked when you run a masking flow job, preview or download the data. However, the data is masked, and the masked value is the same as the unmasked value.

  2. Select the following method to mask data:

    • Redact columns
    • Obfuscate columns

    Substitute is not supported for advanced masking.

  3. Select your masking options in the Advanced masking options section. Some options are selected by default for you. See Redacting data method and Obfuscating data method for more information.

  4. Create a rule. See Mask data for more information on how to mask data in assets.

Using the masking previews

The Before preview in the Example data section display how the data is masked when you're viewing data assets in catalogs, projects, and dynamically before running masking flow jobs. The After preview in the Example data section display how the data is masked in the masked copies that are produced by running masking flow jobs.

Alt text

Watch video icon Watch this video to see how to set advanced masking options and create a masking flow asset in a project.

This video provides a visual method to learn the concepts and tasks in this documentation.

Next steps

Learn more

Parent topic: Data protection rules

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more