Masking flows define the data that is to be masked, whether it's a full copy of data tables or a subset of data tables with relationships intact. Masking flows are reusable definitions and can be saved as drafts, which is edited, and used to run jobs.
A masking flow job is created when you run a masking flow asset. You can create a job from the Assets page of a project by selecting the options menu on a masking flow asset and clicking New job to run that masking flow. Or you can click the masking flow asset name and click Configure job. The job will be in the Jobs tab. See Running masking flow jobs.
At a minimum, confirm that the following access, rules, and tasks are completed:
- Access to at least one catalog.
- The data admin applied data protection rules to data assets in the catalog.
- Moved data assets to mask to an analytics project. These assets must come from the catalog.
Masking flows
Creating a masking flow includes adding the masking flow to a project, and then specifying the masking type to apply to the masking flow. The masking type determines the records and tables that you want to mask.
Masking types
Masking types help you determine the records and tables that you want to mask.
- Bulk copy
-
Produces masked copies of tables without searching for relationships. Use Bulk copy when you want to quickly mask a few tables, add conditions, and relationships are not important. For example, Customers table, Policies table, and Sales table. See Using Bulk copy masking type.
- Copy related records across tables
-
Masks a related subset of tables. Use this masking type when you want to mask only a specific related subset of data, for example, "Customers in California who own a red truck" and all related data. See Using Copy related records across tables masking type
-
Example for masking only customers in California who own a red truck and all related data
-
Specify a driver table with the following optional conditions:
- Customers table where state = California
- Car = truck
- Car_color = red
The conditions are then scanned for primary key and foreign key relationships.
-
The scan identifies the tables that are related to the driver table. For example, the Insurance Claims table is the child of the Customers table, so only the insurance claims for "customers in California who own a red truck" is added to the subset table. These tables are added to the masking flow.
- You can also define conditions in Bulk copy. The difference is that Copy related records across tables scans the relationships and adds related tables.
- You can filter rows in data protection rules on Bulk Copy (but not Copy related records across tables) masking flow type. If row filtering data protection rules affect any asset with the Copy related records across tables masking flow
type, the job fails with the error message:
QPE005: Policy denies access to the asset with row filter rule for Copy related records
.
To create a masking flow:
- Click Add to project and choose Masking flow from the Available asset types.
- Enter a name for the Masking flow.
- Enter an optional description.
- Add an optional tag.
- Click Next.
Using the Bulk copy masking type
To use Bulk copy:
If you're using a CSI-based storage type with volumes of large numbers of files, consider disabling SELinux relabeling. SELinux relabeling can cause a CreateContainerError
that can prevent you from running bulk jobs. To
disable SELinux relabeling, see the steps specific for Data Privacy service in the Disabling SELinux relabeling topic.
- Add data from the project to the masking flow. If no data is found, exit out of the masking flow and add data assets from the catalog to the project.
- Navigate through connections and schemas to add tables to the masking flow.
- Define optional conditions.
- Click Define to add conditions specific to individual data assets.
- Scroll the list of columns and click + to add a column as a condition.
- Enter the value for the condition, such as State: California, and then Save. You can look at the data in this asset by clicking the asset name. Data might already be masked in this view based on your permissions.
- Masked columns:
- Yes means that columns are masked by data protection rules.
- No means that no columns in this data table has a relevant data protection rule applied.
- Sampling:
- Select the assets that you want to include in your sample, and then click Set sampling. You can also click Set sampling from the overflow menu.
- Specify the number of rows, starting from the first row, that you want to mask and include in a sample for a selected table. For example, if you want to include the first 1000 rows in a table for your sample, the sampling limit is 1000 and the first 1000 rows are only copied and masked.
- Save the masking flow as draft or configure a job.
Using the Copy related records across tables masking type
To use Copy related records across table:
-
Add data from a project to the masking flow. If no data is found, exit out of the masking flow and add data assets from the catalog to the project.
-
Navigate through connections and schemas to select the driver table, from the previous example, and add to the masking flow. You can select only the driver table here and related tables are added in subsequent steps. Relationship analysis begins.
The driver table is added to the Assets list and you can optionally add conditions and configure sampling.
-
To add related tables, click Add related tables.
-
Choose whether you want to search for parent tables or child tables or both. For example,
- CUSTOMERS table has the following columns:
- customer_id (primary key)
- customer_name
- customer address
- salesperson_id (foreign key).
- SALES table has the following columns:
- salesperson_id (primary key)
- salesperson_name
- amount_sales
- TRANSACTIONS table has the following columns:
- transaction_id (primary key)
- product_sold
- customer_id (foreign key)
In this example, the CUSTOMERS table is the driver table. The SALES table is a parent table of the CUSTOMERS because a primary-key and foreign-key relationship on the salesperson_id column exists. The SALES table has the primary key and CUSTOMERS table has the foreign key.
The TRANSACTIONS table is a child table of the CUSTOMERS table because a primary-key and foreign-key relationship on the customer_id column exists. The CUSTOMERS table has the primary key and the TRANSACTIONS table has the foreign-key.
- CUSTOMERS table has the following columns:
-
Add the parent tables or child tables or both.
-
Save the masking flow as draft or configure a job.
Watch this video to see how to set advanced masking options and create a masking flow asset in a project.
This video provides a visual method to learn the concepts and tasks in this documentation.
Learn more
Parent topic: Masking data with Masking flow