Sample stage in DataStage

Sample stage

The Sample stage samples an input data set.

The Sample stage can have a single input link and any number of output links when operating in percent mode, or a single input and single output link when operating in period mode. It is one of a number of stages that IBM DataStage provides to help you sample data, see also:

Head stage, Head stage.
Tail stage, Tail stage.
Peek stage, Peek stage.

The Sample stage is a debug stage. It operates in two modes. In Percent mode, it extracts rows, selecting them by means of a random number generator, and writes a given percentage of these to each output data set. You specify the number of output data sets, the percentage written to each, and a seed value to start the random number generator. You can reproduce a given distribution by repeating the same number of outputs, the percentage, and the seed value.

In Period mode, it extracts every Nth row from each partition, where N is the period, which you supply. In this case all rows will be output to a single data set, so the stage used in this mode can only have a single output link

For both modes you can specify the maximum number of rows that you want to sample from each partition.

When you double click the Sample stage, the properties panel opens. The properties panel has three tabs:

Stage . This is always present and is used to specify general information about the stage.
Input. This is where you specify details about the data set being Sampled.
Output. This is where you specify details about the Sampled data being output from the stage.