Sample stage: Stage tab (DataStage)

The Sample stage properties tab enables you to control aspects of the Sample stage.

Double click the stage to open the stage properties panel. The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes. Specify an optional description of the stage.

Properties

You can specify the following properties:

Table 1. Properties
Category/Property	Values	Default	Mandatory?	Repeats?	Dependent of
Options/Sample mode	percent/period	percent	Y	N	N/A
Options/percent	number	N/A	Y (if Sample Mode = Percent)	Y	N/A
Options/Output link number	number	N/A	Y	N	Percent
Options/seed	number	N/A	N	N	N/A
Options/period per partition	number	N/A	Y (if Sample Mode = Period)	N	N/A
Options/max rows per partition	number	N/A	N	N	N/A

Sample mode

Specifies the type of sample operation. You can sample on a percentage of input rows (percent), or you can sample the Nth row of every partition (period).

Percent

Specifies the sampling percentage for each output data set when use a Sample Mode of Percent. You can repeat this property to specify different percentages for each output data set. The sum of the percentages specified for all output data sets cannot exceed 100%. You can specify a job parameter if required.

Percent has a dependent property:

Output Link Number
This specifies the output link to which the percentage corresponds. Select Edit to open the Output Link Number section. You can specify a job parameter if required.

Seed

This is the number used to initialize the random number generator. You can specify a job parameter if required. This property is only available if Sample Mode is set to percent.

Period (per partition)

Specifies the period when using a Sample Mode of Period.

Max rows per partition

This specifies the maximum number of rows that will be sampled from each partition.

Advanced

The Advanced section on the Stage tab allows you to specify the following options:

Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In Sequential mode the entire data set is processed by the conductor node.
Combinability mode. This is Auto by default, which allows IBM® DataStage® to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
Preserve partitioning. This is Propagate by default. It adopts Set or Clear from the previous stage. You can explicitly select Set or Clear. Select Set to request that next stage in the job should attempt to maintain the partitioning.