DataStage environments
Control how your DataStage jobs run on the runtime engine by configuring environments. You can run DataStage jobs in environments on IBM Cloud or you can run jobs locally by setting up environments with your own DataStage remote runtime engines.
DataStage environments on IBM Cloud
IBM® DataStage® offers three PX environments that you can use to run your jobs. A job uses Default DataStage PX S runtime by default. However, before you run the flow as a job, you can update the environment to any of the three environments that are available.
The three runtimes of IBM Cloud consume capacity unit hours (CUHs) that are tracked. Only the time it takes to run jobs is tracked. Creating, configuring, and updating flows on the DataStage canvas does not use any CUHs.
When you create a job in which to run a DataStage flow, you can select one of the following preset environments:
Name | Hardware configuration |
---|---|
Default DataStage PX S |
1 Conductor: 2 vCPU and 8 GB RAM |
Default DataStage PX M |
1 Conductor: 4 vCPU and 16 GB RAM |
Default DataStage PX L |
1 Conductor: 8 vCPU and 32 GB RAM |
The Default DataStage PX S runtime is used when you run a job to extract, transform, and load data in DataStage, unless you select a different environment. For complex jobs with large data sets, select plans with more vCPU and memory to increase capacity. The default environments use 2 partitions.
To update the environment that you want to use:
- On the DataStage canvas, select the run settings icon and select the environment that you want to use.
- Select a job, edit the job configuration, and on the run settings tab, change the environment.
Administrators can create new environments for IBM Cloud to specify environment variables and change the number of partitions.
DataStage environments on remote runtime engines
You can run jobs in an environment that's not managed by IBM using a remote runtime engine. With a DataStage remote runtime engine, you can use on-premises applications and databases and run jobs locally. An administrator can configure DataStage remote runtime engines at the project level. Developers with Editor or Admin access to a project with a DataStage remote runtime engine can run jobs in that environment.
Once you select a remote environment as a project default environment, you can only use remote environments in that project. You cannot switch back to using IBM Cloud environments for that project's DataStage jobs.
- Run workloads and process data locally
- Avoid data transfer costs
- Increase security by keeping data local to your cloud environment
- Use DataStage features from Cloud Pak for Data such as User-defined stages, the Java Integration stage, Before/after job routines, and more, without maintaining a full Cloud Pak for Data install
Remote environments do not support connectors that need a driver upload, vaults, and the Data service connector. Several connectors are supported only via flow connection.
For more information, see DataStage as a Service Anywhere.
Running a flow
You can create a job in which to run your DataStage flow:
- Directly on the DataStage canvas by clicking the run icon from the DataStage toolbar (the default name of a job that runs a flow is the flow's name appended with .DataStage job
- From your project’s DataStage flows page by selecting the DataStage flow and clicking the Action menu and selecting New job.
When you run a job to extract, transform, or load data in DataStage, a Default DataStage XS runtime
is started automatically and is listed as an active runtime on the
Environments page of your project. You can update the environment you want to
use by selecting the run settings icon on the DataStage
canvas or by selecting a job from the Jobs tab and changing the settings
there.
Monitor monthly billing
You must be an IBM Cloud account owner or administrator to see resource usage information.
To see the monthly charges, the amount of CUH used, the number of VPCs used, and the number of users for your service instance, go to the Cloud Usage Dashboard. For each instance, click Manage > Billing and Usage > Usage, click View Instances next to the service name, and then click View instance next to instance name.
Runtime logs for jobs
To view the accumulated logs for a DataStage job:
- From the project’s Jobs page, click the DataStage job for which you want to see logs.
- Click the job run. You can view the job log, copy the log to clipboard, or download the log.