Specifying partitioning or collecting methods (DataStage)

You can specify how the data is collected or partitioned before it is processed.

Partitioning data

About this task

If the stage is running in parallel mode, it processes the data in partitions. By default, the partitioning method is set to Auto. You can override the default behavior.

Procedure

Open the Partitioning tab of the Input page.

Select a partitioning method from the list:

Option	Description
(Auto)	IBM® DataStage® attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file. This is the default partitioning method for most stages.
Db2	Replicates the Db2 partitioning method of a specific Db2 table. Requires extra properties to be set. Access these properties by clicking the properties button.
Entire	Each file written to receives the entire data set.
Hash	The records are hashed into partitions based on the value of a key column or columns selected from the Available list.
Modulus	The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag fields.
Random	The records are partitioned randomly, based on the output of a random number generator.
Round Robin	The records are partitioned on a round robin basis as they enter the stage.
Same	Preserves the partitioning already in place.
Range	Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button.

If you selected the hash or modulus partitioning methods, specify a key by clicking on one or more of the columns in the Available list. The selected column or columns appear in the Selected list.

Collecting data

You can specify a collecting method.

About this task

If the stage runs sequentially, and the previous stage in the job runs in parallel, then the data is collected before being written. By default, the collecting method is set to Auto. You can override the default behavior.

Procedure

Open the Partitioning tab of the Input page.

Select a collecting method from the list:

Option	Description
(Auto)	This is the default collection method for the Sequential File stage. Normally, when you are using Auto mode, IBM DataStage will read any row from any input partition as it becomes available.
Ordered	Reads all rows from the first partition, then all rows from the second partition, and so on.
Round Robin	Reads a row from the first input partition, then from the second partition, and so on. After reaching the last partition, the operator starts over.
Sort Merge	Reads rows in an order based on one or more columns of the row. This requires you to select a collecting key column from the Available list.

If you selected the Sort Merge collecting method, specify a collecting key by clicking on one or more of the columns in the Available list. The selected column or columns appear in the Selected list.