0 / 0
Sample stage: Stage tab (DataStage)

Sample stage: Stage tab (DataStage)

The Sample stage properties tab enables you to control aspects of the Sample stage.

Double click the stage to open the stage properties panel. The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes. Specify an optional description of the stage.

Properties

You can specify the following properties:
Table 1. Properties
Category/Property Values Default Mandatory? Repeats? Dependent of
Options/Sample mode percent/period percent Y N N/A
Options/percent number N/A Y (if Sample Mode = Percent) Y N/A
Options/Output link number number N/A Y N Percent
Options/seed number N/A N N N/A
Options/period per partition number N/A Y (if Sample Mode = Period) N N/A
Options/max rows per partition number N/A N N N/A

Sample mode

Specifies the type of sample operation. You can sample on a percentage of input rows (percent), or you can sample the Nth row of every partition (period).

Percent

Specifies the sampling percentage for each output data set when use a Sample Mode of Percent. You can repeat this property to specify different percentages for each output data set. The sum of the percentages specified for all output data sets cannot exceed 100%. You can specify a job parameter if required.

Percent has a dependent property:

  • Output Link Number

    This specifies the output link to which the percentage corresponds. Select Edit to open the Output Link Number section. You can specify a job parameter if required.

Seed

This is the number used to initialize the random number generator. You can specify a job parameter if required. This property is only available if Sample Mode is set to percent.

Period (per partition)

Specifies the period when using a Sample Mode of Period.

Max rows per partition

This specifies the maximum number of rows that will be sampled from each partition.

Advanced

The Advanced section on the Stage tab allows you to specify the following options:
  • Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In Sequential mode the entire data set is processed by the conductor node.
  • Combinability mode. This is Auto by default, which allows IBM® DataStage® to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
  • Preserve partitioning. This is Propagate by default. It adopts Set or Clear from the previous stage. You can explicitly select Set or Clear. Select Set to request that next stage in the job should attempt to maintain the partitioning.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more