Remove Duplicates stage: Stage tab (DataStage)
You can specify aspects of the Modify stage by double-clicking the stage and updating settings on the Stage tab.
Double-click the Remove Duplicates stage to open the stage editor. On the Stage tab, the Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes.
Properties
Category/Property | Values | Default | Mandatory? | Repeats? | Dependent of |
---|---|---|---|---|---|
Keys that Define Duplicates/Key | Input Column | N/A | Y | Y | N/A |
Keys that Define Duplicates/Case sensitive | True/False | True | N | N | Key |
Keys that Define Duplicates/Sort as EBCDIC | True/False | False | N | N | Key |
Options/Duplicate to retain | First/Last | First | Y | N | N/A |
Key
Specifies the key column for the operation. This property can be repeated to specify multiple key columns. You can use the Column Selection dialog box to select several keys at once if required. Key has dependent properties as follows:
- Case Sensitive
Use this to specify whether each key is case sensitive or not, this is set to True by default, that is, the values "CASE" and "case" would not be judged equivalent.
- Sort as EBCDIC
To sort as in the EBCDIC character set, choose True.
Duplicate to retain
Specifies which of the duplicate columns you want to retain. Choose between First and Last. It is set to First by default.
Advanced
This section allows you to specify the following:
- Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
- Combinability mode. This is Auto by default, which allows IBM® DataStage® to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
- Preserve partitioning. This is Set by default. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.