0 / 0
Obfuscating data method (Masking flow)
Last updated: Dec 03, 2024
Obfuscating data method (Masking flow)

Masking data with the obfuscating method replaces the data values with formatted values that match the original format. Similarly, without advanced options, some data classes are not supported with a formatted output and will resort to substitute.

The recommended best practice for masking is to use Advanced Masking with the Repeatable and Validate Input options. These options provide very high confidence that the masking will preserve format, maintain uniqueness, and maintain referential integrity. Data is masked consistently across the enterprise.

Advanced obfuscate options allow you to customize options specific to a data class that produces masked output and maintains nearly all meaningfulness of the data format. This method is the recommended option for all data classes.

About obfuscating data options

Obfuscate method

The obfuscate method includes the Preserve format and Identifier method.

  • Preserve format (default): Masks in accordance to format requirements, maintaining maximum data utility specific to this data class.
  • Identifier masking method: Masks letters and digits in any business identifier. Masks letters with letters, digits with digits and maintains letter case. This method is recommended for identifier data classes, such as customer ID, product ID, and so on.
    All the special characters in Unicode (double-byte characters) that are non-English-language characters is masked to X except the alphanumeric character and some special characters, such as -./@#$ %^&*()\:;?_"

Consistency (Repeatable and Random)

Use this option to specify whether masked values are consistent with input values. To maintain consistent masked values across all data, it is recommended that you use the same masked value repeatedly for the same input value.

  • Repeatable: Use to repeat the same masked value for a repeated input value. The same input value is masked to the same output value. 
    For example, every time you apply the masking to a person name Rebecca Hsu, the masked result returns the same person name Jennifer Gonzalez, for all masking instances.

  • Random: Use to provide random masked values for a repeated input value. The same input value is masked to the random output value.
    For example, when you’re masking the name Rebecca Hsu, every separate masking instance returns another random masked name. The name might be masked to Jennifer Gonzalez for the first instance, and then the name could be Susan Lee for the second instance.

    Note:

    To get random masked values for the same input value, the Input Validation option must be set to "No Validation".

Input validation (Input Validation and No Validation)

Use this option to specify whether input values are validated.

  • Input Validation: Use to obfuscate input values that match the input format. For values that don't match the input format, the following validation occurs:

    • Remove rows in static-masking scenarios.
    • Redact the value in dynamic-masking scenarios.

    For example, if the expected format is a US phone number and one of the values is (19) 235-127-2318923, then that row is removed in static-masking scenarios because the format does not match a standard US phone number. In a dynamic-masking scenarios, the value is redacted.

    Restriction: To reduce masking failures due to input not present in the data sets, this Input validation option is unavailable for the following data classes:
    • First Name
    • Last Name
    • Person Name
    • US Street Name
    • Address Line 1
    • City
  • No validation: Use to retain and mask all input values, regardless of the format.

More obfuscating-specific options

Depending on the obfuscate method that you choose, the following options might be available. They're specific to each method and data class.

  • Advanced options that are specific to the Preserve format include special options for email address and date data classes. See Preserve format method.

  • Advanced options that are specific to the Identifier method include:

    • Characters formatting options (optional)
    • Trim
    • Copy or replace
    Restriction: The character formatting options, such as Uppercase, Remove, Trim, and Copy or replace, are ignored during dynamic masking.

See Identifier masking method.

Obfuscating Date or Date of Birth data classes

When the data class is Date or Date Of Birth and the profiled input data type is a date, the masked output is obfuscated in a date format (YYYY-mm-dd). For other data types, the masked output is obfuscated in a timestamp format (YYYY-MM-dd HH:mm:ss.sss).

Obfuscate or substitute masking might revert to redacted values

Although the masking rules are created as Obfuscate or Substitute methods, you might observe that some data is masked as redacted values 'XXXXXXXXXX' (or equivalent that is based on the data type). If a masked column includes any NULL values, Random Obfuscation masking is attempted for the NULL values instead of designated masking rule. If any errors are encountered during masking either NULL or non-NULL values, the data is masked as redacted values.

Learn more

Parent topic: Advanced data masking

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more