0 / 0
Funnel stage: Partitioning on input links

Funnel stage: Partitioning on input links

The Partitioning tab allows you to specify details about how the data on each of the incoming links is partitioned or collected before it is funneled.

It also allows you to specify that the data should be sorted before being operated on.

By default the stage partitions in Auto mode. This attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file.

If the Funnel stage is operating in sequential mode, it will first collect the data before writing it to the file using the default Auto collection method.

The Partitioning tab allows you to override this default behavior. The exact operation of this tab depends on:

  • Whether the Funnel stage is set to execute in parallel or sequential mode.
  • Whether the preceding stage in the job is set to execute in parallel or sequential mode.

If the Funnel stage is set to execute in parallel, then you can set a partitioning method by selecting from the Partition type drop-down list. This will override any current partitioning.

If you are using the Sort Funnel method, and haven't partitioned the data in a previous stage, you should key partition it by choosing the Hash or modulus partition method on this tab.

If the Funnel stage is set to execute in sequential mode, but the preceding stage is executing in parallel, then you can set a collection method from the Collector type drop-down list. This will override the default collection method.

The following partitioning methods are available:

  • (Auto). InfoSphere® DataStage® attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file. This is the default partitioning method for the Funnel stage.
  • Entire. Each file written to receives the entire data set.
  • Hash. The records are hashed into partitions based on the value of a key column or columns selected from the Available list.
  • Modulus. The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag fields.
  • Random. The records are partitioned randomly, based on the output of a random number generator.
  • Round Robin. The records are partitioned on a round robin basis as they enter the stage.
  • Same. Preserves the partitioning already in place.
  • Db2®. Replicates the Db2 partitioning method of a specific Db2 table. Requires extra properties to be set. Access these properties by clicking the properties button.
  • Range. Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button.

The following Collection methods are available:

  • (Auto). This is the default collection method for Funnel stages. Normally, when you are using Auto mode, InfoSphere DataStage will eagerly read any row from any input partition as it becomes available.
  • Ordered. Reads all records from the first partition, then all records from the second partition, and so on.
  • Round Robin. Reads a record from the first input partition, then from the second partition, and so on. After reaching the last partition, the operator starts over.
  • Sort Merge. Reads records in an order based on one or more columns of the record. This requires you to select a collecting key column from the Available list.

The Partitioning tab also allows you to specify that data arriving on the input link should be sorted before being funneled. The sort is always carried out within data partitions. If the stage is partitioning incoming data the sort occurs after the partitioning. If the stage is collecting data, the sort occurs before the collection.

If you are using the Sort Funnel method, and haven't sorted the data in a previous stage, you should sort it here using the same keys that the data is hash partitioned on and funneled on. The availability of sorting depends on the partitioning or collecting method chosen (it is not available for the default auto methods).

Select the check boxes as follows:

  • Perform Sort. Select this to specify that data coming in on the link should be sorted. Select the column or columns to sort on from the Available list.
  • Stable. Select this if you want to preserve previously sorted data sets. This is the default.
  • Unique. Select this to specify that, if multiple records have identical sorting key values, only one record is retained. If stable sort is also set, the first record is retained.

If NLS is enabled an additional button opens a dialog box allowing you to select a locale specifying the collate convention for the sort.

You can also specify sort direction, case sensitivity, whether sorted as ASCII or EBCDIC, and whether null columns will appear first or last for each column. Where you are using a keyed partitioning method, you can also specify whether the column is used as a key for sorting, for partitioning, or for both. Select the column in the Selected list and right-click to invoke the shortcut menu.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more