Running DataStage jobs | IBM Cloud Pak for Data as a Service

Running DataStage jobs

Last updated: Dec 09, 2024

Running DataStage jobs

Components that make up a DataStage job

A DataStage® job consists of the following components:

A DataStage flow
A runtime environment
Job parameters
Parameter sets

Job relationships

DataStage flows can have a one-to-many relationship with DataStage jobs. You can use one DataStage flow to create multiple jobs.

Multi-instance jobs

All DataStage jobs can be instantiated multiple times, resulting in multiple job runs or invocations of the same unmodified job. You do not need to wait for a job to complete sending a new job run request for that job. You can send a new job run request through the REST API, command line (cpdctl), or the Jobs dashboard. You can also use multiple invocations of the same job to process different data sets by setting different parameters for each run. Each DataStage job run has a job run ID.

The developer who designs the job is responsible for ensuring that multiple job runs do not conflict with each other. For example, different invocations that are running concurrently might write to the same table. Multiple runs of a job might also adversely affect job performance.

You can set a DSJobInvocationId value to appear as the job run “Name” on the Jobs dashboard, so you can find a particular job run by name. You can define the DSJobInvocationId by creating a parameter or environment variable. You can set the DSJobInvocationId from a pipeline or when you start the DataStage job (with the command line, for example).

You can optionally specify a job run name when running a pipeline flow or a pipeline job and see the different job runs in the Job details dashboard. Otherwise, you can also assign a local parameter DSJobInvocationId to either a Run pipeline job node or Run DataStage job node (the latter is not available for watsonx). If both the parameter DSJobInvocationId and job run name of the node are set, DSJobInvocationId will be used. If neither are set, the default value "job run" is used.

You do not need to create the DSJobInvocationId to create a multi-instance job.

Migrated DataStage parallel and sequence jobs import DSJobInvocationId as a parameter.

DataStage job instances that are invoked separately are different from the instances that are generated when you run a partitioned job across several processors. The built-in partitioning and collecting handles the situation where several processes want to read or write to the same data source for partitioned jobs.

Creating a job from the DataStage design canvas

To create a DataStage job directly in DataStage, you must create the job from DataStage design canvas within a DataStage flow.

Complete the following steps to create the job from the DataStage design canvas within a DataStage flow:

Open a DataStage flow.
Optional: Click the Settings icon in the toolbar to open the Settings page and specify settings for the job.
Click Compile to compile the DataStage flow.
Click Run to run the DataStage flow.
A job is created and run automatically. After the run finishes, it is listed on the Jobs tab in the project that your DataStage flow is in.

Creating a job from the project level

You can create a job from the Assets tab of your project.

To create a job from the Assets tab of your project:

Select a DataStage flow from the list on the Assets tab of the project. Choose Create job from the menu icon with the lists of options at the end of the table row.
Define the job details by entering a name and a description (optional).
Specify the settings that you want for the job.
On the Schedule page, you can optionally add a one-time or repeating schedule.
If you define a start day and time without selecting Repeat, the job will run exactly one time at the specified day and time. If you define a start date and time and you select Repeat, the job will run for the first time at the timestamp that is indicated in the Repeat section.

You can't change the time zone; the schedule uses your web browser's time zone setting. If you exclude certain weekdays, the job might not run as you would expect. The reason might be due to a discrepancy between the time zone of the user who creates the schedule, and the time zone of the compute node where the job runs.
Optionally set to see notifications for the job. You can select the type of alerts to receive.
Review the job settings. Then, create the job and run it immediately, or create the job and run it later.

Create a job from the command line cpdctl dsjob utility

You can create a DataStage job by using an existing DataStage flow with the cpdctl dsjob command line utility, cpdctl dsjob create-job.

See the following example:

cpdctl dsjob create-job --project DataStageProjectName --flow DataStageFlowName \
--description "This is a test job created from command line" \
--schedule-start 2022-11-07 \
--schedule-end 2022-12-08 \
--repeat hourly

Running jobs

To run a job manually, you can click the run icon Run icon shaped like pointing arrow from the toolbar in the DataStage design canvas. You can start a scheduled job based on the schedule and on demand.

You can also run a job manually by clicking the run icon from the toolbar when you are viewing the job details for a particular job. Jobs can be scheduled or can be run on demand. In addition, jobs can be run by using the API or the command-line utility.

Managing jobs

You can manage jobs from the Jobs tab of your project.

Click the Edit filters icon to filter by different criteria, such as asset type and schedule.
Click the Jobs drop-down menu next to the job search field to filter by criteria such as jobs with active runs, active runs, jobs with finished runs, and finished runs.
Enter information in the search field to search for specific jobs.

Viewing job run details and run metrics

Click a job name in the list of jobs to review the run information and details. The job details page also lists the run name if it was set by using the DSJobInvocationId parameter.

You can select a particular run for a job and review the run details. Run details include duration, start and end times, user that started the run, the associated job, run name, and associated DataStage flow. Settings and runtime parameter values are listed. The job run log for the runs is also shown. The timestamp that is shown in the log is that from the px-runtime server instance. It is not converted to the user locale while you browse the log.

Click Run metrics in the run details to view the job run metrics. View a summary of metrics for the overall flow or search for specific links and stages. You can filter the run metrics based on status (in progress, failed, or completed).

You can also access the run metrics from within a flow by clicking the Run metrics button in the canvas toolbar. Click a link or stage in the metrics list to move focus to it in the canvas.

Was the topic helpful?

0/1000