Sort stage: Stage tab (DataStage)
You can specify aspects of the Sort stage by double-clicking the stage and in the stage editor clicking on the Stage tab.
The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes.
Properties
Category/Property | Values | Default | Mandatory? | Repeats? | Dependent of |
---|---|---|---|---|---|
Sorting Keys/Key | Input Column | N/A | Y | Y | N/A |
Sorting Keys/Sort order | Ascending/Descending | Ascending | Y | N | Key |
Sorting Keys/nulls position (only available for Sort Utility = DataStage®) | First/Last | First | N | N | Key |
Sorting Keys/Sort as EBCDIC | True/False | False | N | N | Key |
Sorting Keys/Case sensitive | True/False | True | N | N | Key |
Sorting Keys/Sort key mode (only available for Sort Utility = DataStage) | Sort/Don't Sort (Previously Grouped)/Don't Sort (Previously Sorted) | Sort | Y | N | Key |
Options/Sort utility | DataStage/ UNIX | DataStage | Y | N | N/A |
Options/Stable sort | True/False | True for Sort Utility = DataStage, False otherwise | Y | N | N/A |
Options/Allow duplicates (not available for Sort Utility = UNIX) | True/False | True | Y | N | N/A |
Options/Output statistics | True/False | False | Y | N | N/A |
Options/Create cluster key change column (only available for Sort Utility = DataStage) | True/False | False | N | N | N/A |
Options/Create key change column | True/False | False | N | N | N/A |
Options/Restrict memory usage | number MB | 20 | N | N | N/A |
Options/Workspace | string | N/A | N | N | N/A |
Key
Specifies the key column for sorting. This property can be repeated to specify multiple key columns. You can use the Column Selection dialog box to select several keys at once if required. Key has dependent properties depending on the Sort Utility chosen:
- Sort Order
All sort types. Choose Ascending or Descending. The default is Ascending.
- Nulls position
This property appears for sort type DataStage and is optional. By default columns containing null values appear first in the sorted data set. To override this default so that columns containing null values appear last in the sorted data set, select Last.
- Sort as EBCDIC
To sort as in the EBCDIC character set, choose True.
- Case Sensitive
All sort types. This property is optional. Use this to specify whether each group key is case sensitive or not, this is set to True by default, that is, the values "CASE" and "case" would not be judged equivalent.
- Sort Key Mode
This property appears for sort type DataStage. It is set to Sort by default and this sorts on all the specified key columns.
Set to Don't Sort (Previously Sorted) to specify that input records are already sorted by this column. The Sort stage will then sort on secondary key columns, if any. This option can increase the speed of the sort and reduce the amount of temporary disk space when your records are already sorted by the primary key column(s) because you only need to sort your data on the secondary key column(s).
Set to Don't Sort (Previously Grouped) to specify that input records are already grouped by this column, but not sorted. The operator will then sort on any secondary key columns. This option is useful when your records are already grouped by the primary key column(s), but not necessarily sorted, and you want to sort your data only on the secondary key column(s) within each group
Sort utility
The type of sort the stage will carry out. Choose from:
- DataStage. The default. This uses the built-in IBM® DataStage sorter, you do not require any additional software to use this option.
- UNIX. This specifies that the UNIX sort command is used to perform the sort.
Stable sort
Applies to a Sort Utility type of DataStage, the default is True. It is set to True to guarantee that this sort operation will not rearrange records that are already in a properly sorted data set. If set to False no prior ordering of records is guaranteed to be preserved by the sorting operation.
Allow duplicates
Set to True by default. If False, specifies that, if multiple records have identical sorting key values, only one record is retained. If Stable Sort is True, then the first record is retained. This property is not available for the UNIX sort type.
Output statistics
Set False by default. If True it causes the sort operation to output statistics. This property is not available for the UNIX sort type.
Create cluster key change column
This property appears for sort type DataStage and is optional. It is set False by default. If set True it tells the Sort stage to create the column clusterKeyChange in each output record. The clusterKeyChange column is set to 1 for the first record in each group where groups are defined by using a Sort Key Mode of Don't Sort (Previously Sorted) or Don't Sort (Previously Grouped). Subsequent records in the group have the clusterKeyChange column set to 0.
Create key change column
This property appears for sort type DataStage and is optional. It is set False by default. If set True it tells the Sort stage to create the column KeyChange in each output record. The KeyChange column is set to 1 for the first record in each group where the value of the sort key changes. Subsequent records in the group have the KeyChange column set to 0.
Restrict memory usage
This is set to 20 by default. It causes the Sort stage to restrict itself to the specified number of megabytes of virtual memory on a processing node.
The number of megabytes specified should be smaller than the amount of physical memory on a processing node. For Windows systems, the value for Restrict Memory Usage should not exceed 500.
Workspace
This property appears for sort type UNIX only. Optionally specifies the workspace used by the stage.
Advanced
This section allows you to specify the following:
- Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
- Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
- Preserve partitioning. This is Set by default. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.