Designing DataStage flows | IBM Cloud Pak for Data as a Service

Designing DataStage flows

DataStage® flows are the design-time assets that contain data integration logic.

You can create an empty DataStage flow and add connectors and stages to it or you can import an existing DataStage flow from an ISX or ZIP file.

The basic building blocks of a flow are:

Data sources that read data
Stages that transform the data
Data targets that write data
Links that connect the sources, stages, and targets

Palette and canvas in IBM DataStage

DataStage flows and their associated objects are organized in projects. To start, open an existing project or create a new project.

Creating a DataStage flow
Editing a DataStage flow
Considerations
Learn more
Examples
Supported data sources

Creating a DataStage flow

To create a DataStage flow, complete the following steps.

Open an existing project or create a project.
On the Assets tab, click New asset + > Transform and integrate data.
On the Create a DataStage flow page, use one of the following two methods to create the DataStage flow:
- Click the New tab, add the necessary details for the DataStage flow, then click Create. The new DataStage flow opens with no objects on the DataStage designer canvas.
- Click the Local file tab, then upload an ISX or ZIP file from your local computer. Then, click Create. When the import process is complete, close the import report page, then open the imported DataStage flow from the Assets tab of the project.
Drag connectors or stages from the palette onto the DataStage design canvas as nodes and arrange them as you like. Connect these nodes on the canvas by hovering your pointer over a node to make an arrow appear on the node, then click the arrow icon and drag it to the node that you want to connect to.
This action creates a link between the nodes.

To connect to remote data, see Connecting to a data source in DataStage.
Double-click a node to open up its properties panel, where you can specify configurations and settings for the node.
Click Run when you are done setting up the flow.
The flow is automatically saved, compiled, and run. You can view logs for both the compilation and job run.

After the flow is compiled into a job, you can rerun the job, set a schedule, monitor the job, and update the environment that you want to run it in. For more information about updating the DataStage environment where you want your jobs to run, see DataStage environments.

Editing a DataStage flow

You can use the following actions to edit a DataStage flow.

Drag a stage or connector and drop it on a link between two nodes that are already on the DataStage design canvas. Links are automatically added for the new node and columns are automatically propagated. Click Run again to see the results.
Manually detach and reattach links from nodes on the DataStage canvas by hovering your pointer over them and clicking the end points of the links.
Drag a stage or connector from the palette and drop it onto a link that is already on the canvas. The stage or connector is automatically linked to the node on either side of it and the columns in the DataStage flow automatically propagated.
Click the Replace icon and select another flow to replace your flow. This action is also available for Build, Custom, and Wrapped stages, as well as subflows and Java libraries.

Considerations

Sensitive information and encrypted property values: Specifying encrypted property values such as passwords in DataStage flows is not recommended. Instead, create a parameter set of type Encrypted with a named parameter and do not specify a default value for the parameter. In your flow, reference the encrypted parameter set and specify the named parameter for the property value, ex: #<parameter set>.<parameter name>#. Specify the encrypted value of the parameter #parameter set.parameter name# in the job running your flow.
Naming files in sources and targets to avoid data corruption: In most cases, do not use the same file name in the source as in the target if the source and target points to the same database or storage system. This rule applies to files and database tables. If the names are the same, the data can be corrupted.
Column metadata change propagation: When you change a column's metadata, the changes are automatically propagated downstream. Changes made upstream do not apply to a column once you modify its metadata. If you delete a column, modifying the column in a later stage will not add the column back.
Runtime column propagation: When RCP is set, if your job encounters extra columns that are not defined in the metadata when it runs, it adopts these extra columns and propagates them through the rest of the job. This avoids errors due to missing mappings.
Adding parameters: See Adding parameters.

Learn more

Examples

Creating a DataStage flow

Watch the following video for an example of how to create a simple DataStage flow.

This video provides a visual method to learn the concepts and tasks in this documentation.

Importing a DataStage flow into a project

Watch the following video for an example of how to import a DataStage flow into a project.

This video provides a visual method to learn the concepts and tasks in this documentation.