0 / 0
Aggregator stage in DataStage: Calculation and recalculation dependent properties

Aggregator stage: Calculation and recalculation dependent properties

Some properties are dependents of both Column for Calculation and Summary Column for Recalculation.

These specify the various aggregate functions and the output columns to carry the results.

  • Corrected Sum of Squares

    Produces a corrected sum of squares for data in the aggregate column and outputs it to the specified output column.

  • Maximum Value

    Gives the maximum value in the aggregate column and outputs it to the specified output column.

  • Mean Value

    Gives the mean value in the aggregate column and outputs it to the specified output column.

  • Minimum Value

    Gives the minimum value in the aggregate column and outputs it to the specified output column.

  • Missing Value

    This specifies what constitutes a "missing" value, for example -1 or NULL. Enter the value as a floating point number. Not available for Summary Column to Recalculate.

  • Missing Values Count

    Counts the number of aggregate columns with missing values in them and outputs the count to the specified output column. Not available for recalculate.

  • Non-missing Values Count

    Counts the number of aggregate columns with values in them and outputs the count to the specified output column.

  • Percent Coefficient of Variation

    Calculates the percent coefficient of variation for the aggregate column and outputs it to the specified output column.

  • Range

    Calculates the range of values in the aggregate column and outputs it to the specified output column.

  • Standard Deviation

    Calculates the standard deviation of values in the aggregate column and outputs it to the specified output column.

  • Standard Error

    Calculates the standard error of values in the aggregate column and outputs it to the specified output column.

  • Sum of Weights

    Calculates the sum of values in the weight column specified by the Weight column property and outputs it to the specified output column.

  • Sum

    Sums the values in the aggregate column and outputs the sum to the specified output column.

  • Summary

    Specifies a subrecord to write the results of the calculate or recalculate operation to.

  • Uncorrected Sum of Squares

    Produces an uncorrected sum of squares for data in the aggregate column and outputs it to the specified output column.

  • Variance

    Calculates the variance for the aggregate column and outputs the sum to the specified output column. This has a dependent property:

    • Variance divisor

      Specifies the variance divisor. By default, uses a value of the number of records in the group minus the number of records with missing values minus 1 to calculate the variance. This corresponds to a vardiv setting of Default. If you specify NRecs, IBM DataStage uses the number of records in the group minus the number of records with missing values instead.

Each of these properties has a dependent property as follows:

  • Decimal Output

    By default all calculation or recalculation columns have an output type of double. This property allows you to specify that columns have an output type of decimal.

    When you specify the decimal output, you can also specify precision and scale. Precision is the number of digits in a number. Scale is the number of digits to the right of the decimal point in a number. The default is 8,2.

    In cases where the required output scale is low, set the precision and scale to p+4, s+4 to get accurate results. If a column has a precision and scale of 4,1, then in the decimal data type, set the precision and scale to 9,5.

    For example, a column that has the values: " 004.0"," 010.0"," 004.0"," 006.0"," 010.0"," 008.0"," 009.0"," 007.0" " 010.0"," 007.0"," 010.0"," 007.0"," 010.0". The precision value for the column is 4 and the scale value is 1. The output is calculated as 7.8 if the precision and scale is set to 9,5. But if the precision and scale is set to 4,1, the output is 7.9. The more accurate calculation is 7.8.

    You can use decimal type for intermediate calculations of the different reduce options. The decimal precision and scale should set large enough to avoid rounding of intermediate calculations. For example, if you are calculating the mean value of a decimal of size precision 8 and scale 2, then the intermediate decimal size should be set to at least precision 10 and scale 4.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more