Creating jobs for running data quality rules

Last updated: Apr 05, 2024
Creating jobs for running data quality rules

In addition to the jobs that are created automatically when you run a rule, you can create jobs for data quality rules manually, for example, to enable automation.

To automate the process of running data quality rules, you can create scheduled repeating jobs for the DataStage flow attached to a data quality rule.

You can create additional jobs in one of these ways:

  • On the project's Assets page as described in Creating jobs from the Assets page.

  • From within a data quality rule asset.

    1. Open the data quality rule asset.

    2. Select New job from the overflow menu next to the asset name.

    3. Define the job details by entering a name and a description (optional).

    4. On the Settings page, select an environment runtime for the job, and optionally modify the number of warnings to wait before the stages are stopped. The default setting for rule jobs is 100.

    5. On the Schedule page, you can optionally add a one-time or repeating schedule.

      If you define a start day and time without selecting Repeat, the job will run exactly one time at the specified day and time. If you define a start date and time and you select Repeat, the job will run for the first time at the timestamp indicated in the Repeat section.

      You can't change the time zone; you must set your job schedule in your web browser's time zone. The schedule will be translated to the time zone of the compute node where your job runs.

      If you exclude certain weekdays, the job might not run as you would expect. The reason might be due to a discrepancy between the time zone of the user who creates the schedule, and the time zone of the compute node where the job runs.

      Note: Your scheduled job can appear differently if your web browser’s time zone is set to your local time zone following Daylight Savings Time (DST). For example, your scheduled job appears at 3:00PM Eastern Standard Time Zone (EST) daily corresponding to 8:00PM Coordinated Universal Time (UTC). When your local time zone changes to Eastern Daylight Time Zone (EDT), your scheduled job continues to run at 8:00PM (UTC) which will now appear as 4:00PM (EDT) daily.
    6. Optional: Set up notifications for the job. You can select the type of alerts to receive.

    7. Review the job settings. Then, create the job and run it immediately, or create the job and run it later.

    The DataStage flow job for your rule is listed on the Jobs page in your project.

Rules are run with IBM Cloud credentials. Typically, your personal IBM Cloud API key is used to execute such long-running operations without disruption. If credentials are not available when you create the job, you are prompted to create an API key.

Learn more

Parent topic: Jobs