Anonymizing data for data protection policies

Data anonymization helps you protect sensitive data, such as personally identifiable information or restricted business data to avoid the risk of compromising confidential information. It is defined in policy rules that are enforced for an asset. Depending on the method of data anonymization, data is redacted, masked, or substituted in the asset preview.

The shield ( shield) icon in the column header of the asset on the Overview page indicates that this data column contains anonymized data. The schema information always reflects the total number of columns that are contained in the original asset.

Note: Asset owners can view any data within the asset even if data is anonymized.

When creating rules, you first define conditions in the rule builder and then decide whether to deny access to the asset or anonymize data according to data policies.

To anonymize data in assets:

  1. Complete the conditions and select the attributes that you want to process.
  2. Select the action Anonymize data.
  3. In columns containing: Select the classifier groups or individual attribute classifiers.
    By default, this field contains the attributes that you selected when defining the condition. You can now remove or add more attribute classifiers.
  4. Select the method to anonymize data:
    • Redact data values in asset columns.
      This method replaces each data value with a string of exactly ten letters of X to remove information that is, for example, identifying or otherwise sensitive. With redacted data, neither the format of the data nor referential integrity is retained.
    • Substitute data values in asset columns.
      This method replaces data with values that don’t match the original format. It preserves referential integrity (RI) to ensure that table relationships are consistent.
      If a value is used several times in a column with substituted data, Substitute uses the same substitution value for identical data values.
      For example, if a column contains the email address userA@example.com several times, each finding is replaced by the same substitution value, such as: 500ddcc98133703531re3456.
    • Mask data values in asset columns that contain SSN (US social security numbers) data.
      This method replaces data types like SSN with similarly formatted values that match the original format. It does not preserve referential integrity (RI) or data distribution.
  5. Click Create.

Watch this video to see how to anonymize data.

Figure 9. Video iconAnonymize data
This video demonstrates how to anonymize data.

Next steps

Learn more