0 / 0
Sort stage: Stage tab (DataStage)

Sort stage: Stage tab (DataStage)

You can specify aspects of the Sort stage by double-clicking the stage and in the stage editor clicking on the Stage tab.

The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes.

Properties

You can specify the following properties:
Table 1. Properties
Category/Property Values Default Mandatory? Repeats? Dependent of
Sorting Keys/Key Input Column N/A Y Y N/A
Sorting Keys/Sort order Ascending/Descending Ascending Y N Key
Sorting Keys/nulls position (only available for Sort Utility = DataStage®) First/Last First N N Key
Sorting Keys/Sort as EBCDIC True/False False N N Key
Sorting Keys/Case sensitive True/False True N N Key
Sorting Keys/Sort key mode (only available for Sort Utility = DataStage) Sort/Don't Sort (Previously Grouped)/Don't Sort (Previously Sorted) Sort Y N Key
Options/Sort utility DataStage/ UNIX DataStage Y N N/A
Options/Stable sort True/False True for Sort Utility = DataStage, False otherwise Y N N/A
Options/Allow duplicates (not available for Sort Utility = UNIX) True/False True Y N N/A
Options/Output statistics True/False False Y N N/A
Options/Create cluster key change column (only available for Sort Utility = DataStage) True/False False N N N/A
Options/Create key change column True/False False N N N/A
Options/Restrict memory usage number MB 20 N N N/A
Options/Workspace string N/A N N N/A

Key

Specifies the key column for sorting. This property can be repeated to specify multiple key columns. You can use the Column Selection dialog box to select several keys at once if required. Key has dependent properties depending on the Sort Utility chosen:

  • Sort Order

    All sort types. Choose Ascending or Descending. The default is Ascending.

  • Nulls position

    This property appears for sort type DataStage and is optional. By default columns containing null values appear first in the sorted data set. To override this default so that columns containing null values appear last in the sorted data set, select Last.

  • Sort as EBCDIC

    To sort as in the EBCDIC character set, choose True.

  • Case Sensitive

    All sort types. This property is optional. Use this to specify whether each group key is case sensitive or not, this is set to True by default, that is, the values "CASE" and "case" would not be judged equivalent.

  • Sort Key Mode

    This property appears for sort type DataStage. It is set to Sort by default and this sorts on all the specified key columns.

    Set to Don't Sort (Previously Sorted) to specify that input records are already sorted by this column. The Sort stage will then sort on secondary key columns, if any. This option can increase the speed of the sort and reduce the amount of temporary disk space when your records are already sorted by the primary key column(s) because you only need to sort your data on the secondary key column(s).

    Set to Don't Sort (Previously Grouped) to specify that input records are already grouped by this column, but not sorted. The operator will then sort on any secondary key columns. This option is useful when your records are already grouped by the primary key column(s), but not necessarily sorted, and you want to sort your data only on the secondary key column(s) within each group

Sort utility

The type of sort the stage will carry out. Choose from:

  • DataStage. The default. This uses the built-in IBM® DataStage sorter, you do not require any additional software to use this option.
  • UNIX. This specifies that the UNIX sort command is used to perform the sort.

Stable sort

Applies to a Sort Utility type of DataStage, the default is True. It is set to True to guarantee that this sort operation will not rearrange records that are already in a properly sorted data set. If set to False no prior ordering of records is guaranteed to be preserved by the sorting operation.

Allow duplicates

Set to True by default. If False, specifies that, if multiple records have identical sorting key values, only one record is retained. If Stable Sort is True, then the first record is retained. This property is not available for the UNIX sort type.

Output statistics

Set False by default. If True it causes the sort operation to output statistics. This property is not available for the UNIX sort type.

Create cluster key change column

This property appears for sort type DataStage and is optional. It is set False by default. If set True it tells the Sort stage to create the column clusterKeyChange in each output record. The clusterKeyChange column is set to 1 for the first record in each group where groups are defined by using a Sort Key Mode of Don't Sort (Previously Sorted) or Don't Sort (Previously Grouped). Subsequent records in the group have the clusterKeyChange column set to 0.

Create key change column

This property appears for sort type DataStage and is optional. It is set False by default. If set True it tells the Sort stage to create the column KeyChange in each output record. The KeyChange column is set to 1 for the first record in each group where the value of the sort key changes. Subsequent records in the group have the KeyChange column set to 0.

Restrict memory usage

This is set to 20 by default. It causes the Sort stage to restrict itself to the specified number of megabytes of virtual memory on a processing node.

The number of megabytes specified should be smaller than the amount of physical memory on a processing node. For Windows systems, the value for Restrict Memory Usage should not exceed 500.

Workspace

This property appears for sort type UNIX only. Optionally specifies the workspace used by the stage.

Advanced

This section allows you to specify the following:

  • Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
  • Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
  • Preserve partitioning. This is Set by default. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more