0 / 0
Running masking flow jobs
Last updated: Jan 21, 2025
Running masking flow jobs

In masking flow jobs, data users define the target destination for masked data copies. Jobs can be scheduled, and upon completion of a successful job, you can view the job report summary.

There are two ways to create masking flow jobs:

  • After creating a masking flow, click Configure job.
  • Click the Options menu on an individual data asset to skip creating a masking flow and to configure a masking job directly for that data asset.
Note: During a masking flow job, errors might occur when there isn't enough memory to support the job. To avoid errors, the maximum size of data can be no larger than 12GBs.

Creating Masking flow environment

Required permissions

To create an environment template, both of the following conditions must be true:

  • You must have the Admin or Editor role in the project.
  • You must belong to the project creator's IBM Cloud account.

To create an environment template to your project:

  1. From your project, click the Manage tab and then click the Environments page.
  2. From the Environments page, click the Templates tab and then click New template.
  3. Define the environment details by entering a name and a description (optional).
  4. Under Type, click Spark.
  5. Under the Hardware configuration, specify the driver and executor configuration based on the size of the table in the masking flow job.
  6. Under the Number of executors, increase the number of executors to improve the performance of the masking flow when the jobs are configured to run with partitions. For more information, see the Best practices section of the Managing job performance topic.
  7. From the Software version list, select Masking Flow Spark.
  8. Click Create.

Working with jobs

To configure a job:

  1. Enter the name of the job and add an optional description of the job.
  2. Add the target connection where you want to insert masked data copy. The source connection is used to read data.
  3. Click + to add a new connection. The schema maps the source table to the target table. Table definitions must already be configured in the target schema.
Tip: When the source asset is Apache Hive, use Apache HDFS as the target connection.
  1. (Optional) From the Partition page, you can optionally edit the partition details for the asset:
    • If you create masking flows with Set sampling, you can safely ignore editing the Partition page by setting the Edit partition details to Off.
    • If you have tables with large amounts of data, consider to edit the partition details by specifying a column as the partition column. To improve the job performance, you can increase the number of partitions. For more information, see the Best practices section of the Managing job performance topic.
  2. (Optional) Schedule a job or schedule a recurring job.
  3. Review and run the job.

Learn more

Parent topic: Masking data with Masking flow