Designing DataStage flows
DataStage® flows are the design-time assets that contain data integration logic.
You can create an empty DataStage flow and add connectors and stages to it or you can import an existing DataStage flow from an ISX or ZIP file.
- Data sources that read data
- Stages that transform the data
- Data targets that write data
- Links that connect the sources, stages, and targets
DataStage flows and their associated objects are organized in projects. To start, open an existing project or create a new project.
Creating a DataStage flow
To create a DataStage flow, complete the following steps.
- Open an existing project or create a project.
- On the Assets tab, click .
- On the Create a DataStage flow page, use one of the following two methods
to create the DataStage flow:
- Click the New tab, add the necessary details for the DataStage flow, then click Create. The new DataStage flow opens with no objects on the DataStage designer canvas.
- Click the Local file tab, then upload an ISX or ZIP file from your local computer. Then, click Create. When the import process is complete, close the import report page, then open the imported DataStage flow from the Assets tab of the project.
- Drag connectors or stages from the palette onto the DataStage design canvas as nodes and arrange them as you
like. Connect these nodes on the canvas by hovering your pointer over a node to make an arrow appear
on the node, then click the arrow icon and drag it to the node that you want to connect to.
This action creates a link between the nodes.
To connect to remote data, see Connecting to a data source in DataStage.
- Double-click a node to open up its properties panel, where you can specify configurations and settings for the node.
- Click Run when you are done setting up the flow.
The flow is automatically saved, compiled, and run. You can view logs for both the compilation and job run.
Editing a DataStage flow
You can use the following actions to edit a DataStage flow.
- Drag a stage or connector and drop it on a link between two nodes that are already on the DataStage design canvas. Links are automatically added for the new node and columns are automatically propagated. Click Run again to see the results.
- Manually detach and reattach links from nodes on the DataStage canvas by hovering your pointer over them and clicking the end points of the links.
- Drag a stage or connector from the palette and drop it onto a link that is already on the canvas. The stage or connector is automatically linked to the node on either side of it and the columns in the DataStage flow automatically propagated.
- Click the Replace icon and select another flow to replace your flow. This action is also available for Build, Custom, and Wrapped stages, as well as subflows and Java libraries.
Considerations
- Sensitive information and encrypted property values
- Specifying encrypted property values such as passwords in DataStage flows is not recommended. Instead, create a
parameter set of type Encrypted with a named parameter and do not specify a
default value for the parameter. In your flow, reference the encrypted parameter set and specify the
named parameter for the property value, ex:
#<parameter set>.<parameter name>#
. Specify the encrypted value of the parameter#parameter set.parameter name#
in the job running your flow. - Naming files in sources and targets to avoid data corruption
- In most cases, do not use the same file name in the source as in the target if the source and target points to the same database or storage system. This rule applies to files and database tables. If the names are the same, the data can be corrupted.
- Column metadata change propagation
- When you change a column's metadata, the changes are automatically propagated downstream. Changes made upstream do not apply to a column once you modify its metadata. If you delete a column, modifying the column in a later stage will not add the column back.
- Runtime column propagation
- When RCP is set, if your job encounters extra columns that are not defined in the metadata when it runs, it adopts these extra columns and propagates them through the rest of the job. This avoids errors due to missing mappings.
- Adding parameters
- See Adding parameters.
Learn more
Examples
- Creating a DataStage flow
-
Watch the following video for an example of how to create a simple DataStage flow.
This video provides a visual method to learn the concepts and tasks in this documentation.
- Importing a DataStage flow into a project
-
Watch the following video for an example of how to import a DataStage flow into a project.
This video provides a visual method to learn the concepts and tasks in this documentation.