Aggregator stage: Stage tab (DataStage)
The Aggregator stage tab enables you to control aspects of the Aggregator stage.
Doubleclick the stage to open the stage properties panel. The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes. Specify an optional description of the stage.
Properties section
Use the Properties section to define what the stage actually does.
The following table lists the properties and their attributes.
Category/Property  Values  Default  Mandatory?  Repeats?  Dependent of 

Grouping Keys/Group  Input column  N/A  Y  Y  N/A 
Grouping Keys/Case Sensitive  True/ False  True  N  N  Group 
Aggregations/Aggregation Type  Calculation/ Recalculation/ Count rows  Calculation  Y  N  N/A 
Aggregations/Column for Calculation  Input column  N/A  Y (if Aggregation Type = Calculation)  Y  N/A 
Aggregations/Count Output Column  Output column  N/A  Y (if Aggregation Type = Count Rows)  Y  N/A 
Aggregations/ Summary Column for Recalculation  Input column  N/A  Y (if Aggregation Type = Recalculation)  Y  N/A 
Aggregations/ Default To Decimal Output  precision, scale  8,2  N  N  N/A 
Aggregations/Corrected Sum of Squares  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Maximum Value  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Mean Value  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Minimum Value  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Missing Value  Output column  N/A  N  Y  Column for Calculation 
Aggregations/Missing Values Count  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Nonmissing Values Count  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Percent Coefficient of Variation  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Range  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Standard Deviation  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Standard Error  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Sum of Weights  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Sum  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Summary  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Uncorrected Sum of Squares  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Variance  Output column  N/A  N  N  Column for Calculation & Summary Column for Recalculation 
Aggregations/Variance divisor  Default/ Nrecs  Default  N  N  Variance 
Aggregations/Calculation and Recalculation Dependent Properties  Input column  N/A  N  N  Column for Calculation or Count Output Column 
Aggregations/Decimal Output  precision, scale  8,2  N  N  Calculation or Recalculation method 
Options/Grouping keys  hash/sort  hash  Y  Y  N/A 
Options/Allow Null Outputs  True/ False  False  Y  N  N/A 
Grouping keys
Group
Repeat the property to select multiple columns as group keys. Click Edit to select several group keys at once if required. This property has a dependent property:
 Case Sensitive
Use this to specify whether each group key is case sensitive or not, this is set to True by default, that is, the values "CASE" and "case" in would end up in different groups.
Aggregation category
 Aggregation type
 Choose calculation (the default), recalculation, or count rows.
 Column for calculation
 Click Edit in the Aggregations section to select a column for calculation. The Calculate aggregate type allows you to summarize the contents of a particular column or columns in your input data set by applying one or more aggregate functions to it. Select the column to be aggregated, then select dependent properties to specify the operation to perform on it, and the output column to carry the result. You can use the Column Selection dialog box to select several columns for calculation at once if required).
 Count output column
 The Count Rows aggregate type performs a count of the number of records within each group. Specify the column on which the count is output.
 Summary column for recalculation
 This aggregate type allows you to apply aggregate functions to a column that has already been summarized. This is like calculate but performs the specified aggregate operation on a set of data that has already been summarized. In practice this means you should have performed a calculate (or recalculate) operation in a previous Aggregator stage with the Summary property set to produce a subrecord containing the summary data that is then included with the data set. Select the column to be aggregated, then select dependent properties to specify the operation to perform on it, and the output column to carry the result. You can use the Column Selection dialog box to select several columns for recalculation at once if required).
 Weighting column
 Configures the stage to increment the count for the group by the contents of the weight column
for each record in the group, instead of by 1. Not available for Summary Column for Recalculation.
Setting this option affects only the following options:
 Percent Coefficient of Variation
 Mean Value
 Sum
 Sum of Weights
 Uncorrected Sum of Squares
 Default to decimal output
 The output type of a calculation or recalculation column is double. Setting this property causes it to default to decimal. You can specify that individual columns have decimal output while others retain the default type of double. You can also set a default precision and scale.
Options
 Method

Your choice of mode depends primarily on the number of groupings in the input data set, taking into account the amount of memory available. You typically use hash mode for a relatively small number of groups; generally, fewer than about 1000 groups per megabyte of memory to be used.
When using hash mode, you should hash partition the input data set by one or more of the grouping key columns so that all the records in the same group are in the same partition. However, hash partitioning is not mandatory, you can use any partitioning method you choose if keeping groups together in a single partition is not important. For example, if you're summing records in each partition and later you'll add the sums across all partitions, you don't need all records in a group to be in the same partition to do this. Note, though, that there will be multiple output records for each group.
If the number of groups is large, which can happen if you specify many grouping keys, or if some grouping keys can take on many values, you would normally use sort mode. However, sort mode requires the input data set to have been partition sorted with all of the grouping keys specified as hashing and sorting keys (this happens automatically if auto is set in the Partitioning tab). Sorting requires a pregrouping operation: after sorting, all records in a given group in the same partition are consecutive.
The method property is set to hash by default.
You might want to try both modes with your particular data and application to determine which gives the better performance. You might find that when calculating statistics on large numbers of groups, sort mode performs better than hash mode, assuming the input data set can be efficiently sorted before it is passed to group.
 Allow null outputs
 Set this to True to indicate that null is a valid output value when calculating minimum value, maximum value, mean value, standard deviation, standard error, sum, sum of weights, and variance. If False, the null value will have 0 substituted when all input values for the calculation column are null. It is False by default.
Advanced
 Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data set is processed by the available nodes as specified in the Configuration file, and by any node constraints specified in the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
 Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
 Preserve partitioning. This is Set by default. You can select Set or Clear. If you select Set the stage will request that the next stage in the job attempt to maintain the partitioning.