0 / 0
DataStage stages
Last updated: Dec 09, 2024
DataStage stages

A DataStage® flow consists of stages that are linked together, which describe the flow of data from a data source to a data target. A stage describes a data source, a processing step, or a target system. The stage also defines the processing logic that moves the data from the input links to the output links.

Stage functions

A stage usually has at least one data input or one data output. However, some stages can accept more than one data input, and output to more than one stage. The following table lists the available stages and gives details on their functions:

Table 1. Stages
Stage Icon Function
Aggregator Aggregator icon Classifies incoming data into groups, computes totals and other summary functions for each group, and passes them to another stage in the job.
Bloom Filter Bloom Filter icon Looks up incoming keys against previous values.
Change Apply Change Apply icon Applies encoded change operations to a before data set based on a changed data set. The before and after data sets come from the Change Capture stage.
Change Capture Change Capture icon Compares two data sets and makes a record of the differences.
Checksum Checksum icon Generates a checksum value from the specified columns in a row and adds the checksum to the row.
Column Export Checksum icon Exports data from a number of columns of different data types into a single column of data types ustring, string, or binary.
Column Generator Column Generator icon Adds columns to incoming data and generates mock data for these columns for each data row processed.
Column Import Column Import icon Imports data from a single column and outputs it to one or more columns.
Combine Records Combine Records icon Combines records in which particular key-column values are identical into vectors of subrecords.
Compare Compare icon Performs a column-by-column comparison of records in two presorted input data sets.
Compress Compress icon Uses the UNIX compress or GZIP utility to compress a data set. It converts a data set from a sequence of records into a stream of raw binary data.
Copy Copy icon Copies a single input data set to a number of output data sets.
Decode Decode icon Decodes a data set by using a UNIX decoding command that you supply.
Difference Difference icon Performs a record-by-record comparison of two input data sets, which are different versions of the same data set.
Distributed Transaction Distributed Transaction icon Runs transactions across multiple data sources.
Encode Encode icon Encodes a data set by using a UNIX encoding command that you supply.
Expand Expand icon Uses the UNIX uncompress or GZIP utility to expand a data set. It converts a previously compressed data set back into a sequence of records from a stream of raw binary data.
External Filter External Filter icon Allows you to specify a UNIX command that acts as a filter on the data you are processing.
Filter Filter icon Transfers, unmodified, the records of the input data set that satisfy requirements that you specify and filters out all other records.
Funnel Funnel icon Copies multiple input data sets to a single output data set.
Generic Generic icon Incorporates an Orchestrate® Operator in your job.
Head Head icon Selects the first N records from each partition of an input data set and copies the selected records to an output data set.
Join Join icon Performs join operations on two or more data sets input to the stage and then outputs the resulting data set.
Lookup Lookup icon Used to perform lookup operations on a data set read into memory from any other Parallel job stage that can output data or provided by one of the database stages that support reference output links. It can also perform a look up on a lookup table that is contained in a Lookup File Set stage.
Make Subrecords icon Combines specified vectors in an input data set into a vector of subrecords whose columns have the names and data types of the original vectors.
Make Vector Make Vector icon Combines specified columns of an input data record into a vector of columns.
Merge Merge icon Combines a sorted master data set with one or more sorted update data sets.
Modify Modify icon Alters the record schema of its input data set.
Peek Peek icon Prints record column values either to the job log or to a separate output link as the stage copies records from its input data set to one or more output data sets.
Pivot Enterprise Pivot Enterprise icon

The Pivot Enterprise stage is a processing stage that pivots data horizontally and vertically.

Horizontal pivoting maps a set of columns in an input row to a single column in multiple output rows.

Vertical pivoting maps a set of rows in the input data to single or multiple output columns.

Promote Subrecords Promote Subrecords icon Promotes the columns of an input subrecord to top-level columns.
Remove Duplicates Remove Duplicates icon Takes a single sorted data set as input, removes all duplicate records, and writes the results to an output data set.
Row Generator Row Generator icon Produces a set of mock data fitting the specified meta data.
Sample Sample icon Samples an input data set.
Slowly Changing Dimension (SCD) Sort icon Works within the context of a star schema database to store and manage current and historical data over time.
Sort Sort icon Sorts input columns.
Split Subrecord Split Subrecord icon Separates an input subrecord field into a set of top-level vector columns.
Split Vector Split Vector icon Promotes the elements of a fixed-length vector to a set of similarly named top-level columns.
Surrogate Key Generator stage Surrogate Key Generator icon Generates surrogate key columns and maintains the key source.
Switch Switch icon Takes a single data set as input and assigns each input record to an output data set based on the value of a selector field.
Tail Tail icon Selects the last N records from each partition of an input data set and copies the selected records to an output data set.
Transformer Transformer icon Handles extracted data, performs any conversions that are required, and passes data to another active stage or a stage that writes data to a target database or file.
Wave Generator Wave Generator icon Monitors a stream of data and inserts end-of-wave markers where needed.
Web Service Web Service icon Accesses a web service operations within a DataStage flow or job.
Write Range Map Write Range Map icon Writes data to a range map. The stage can have a single input link.

Watch this series of videos to see how to use the most common stages.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more