Head stage in DataStage
The Head Stage selects the first N rows from each partition of an input data set and copies the selected rows to an output data set. You can sample data using this stage.
The Head Stage is a development and debug stage. It can have a single input link and a single output link. It is one of a number of stages that IBM® DataStage® provides to help you sample data, see also:
- Tail stage, Tail stage in DataStage.
- Sample stage, Sample stage in DataStage.
- Peek stage, Peek stage in DataStage.
The Head Stage selects the first N rows from each partition of an input data set and copies the selected rows to an output data set. You determine which rows are copied by setting properties which allow you to specify:
- The number of rows to copy
- The partition from which the rows are copied
- The location of the rows to copy
- The number of rows to skip before the copying operation begins
This stage is helpful in testing and debugging applications with large data sets. For example, the Partition property lets you see data from a single partition to determine if the data is being partitioned as you want it to be. The Skip property lets you access a certain portion of a data set.
When you double-click the stage, the properties panel opens. The properties panel has three tabs:
- Stage. This is always present and is used to specify general information about the stage.
- Input. This is where you specify the details about the single input set from which you are selecting records.
- Output. This is where you specify details about the processed data being output from the stage.
Input tab
The Columns section specifies the column definitions of incoming data. The Advanced section allows you to change the default buffering settings for the input link.
Output tab
The Head stage can have only one output link.
The Columns section specifies the column definitions of the data. The Maps from column input column section that appears when you click Edit in the columns section allows you to specify the relationship between the columns being input to the Head stage and the output columns. The Advanced section allows you to change the default buffering settings for the output links.