Aggregator stage in DataStage

Aggregator stage

The Aggregator stage classifies data rows from a single input link into groups and computes totals or other aggregate functions for each group. The summed totals for each group are output from the stage through an output link.

When you double-click the Aggregator stage, the properties panel opens. The properties panel has three tabs:

Stage. This is always present and is used to specify general information about the stage.
Input. This is where you specify details about the data being grouped or aggregated.
Output. This is where you specify details about the groups being output from the stage.

The aggregator stage gives you access to grouping and summary operations. One of the easiest ways to expose patterns in a collection of records is to group records with similar characteristics, then compute statistics on all records in the group. You can then use these statistics to compare properties of the different groups. For example, records containing cash register transactions might be grouped by the day of the week to see which day had the largest number of transactions, the largest amount of revenue, and so on.

Records can be grouped by one or more characteristics, where record characteristics correspond to column values. In other words, a group is a set of records with the same value for one or more columns. For example, transaction records might be grouped by both day of the week and by month. These groupings might show that the busiest day of the week varies by season.

In addition to revealing patterns in your data, grouping can also reduce the volume of data by summarizing the records in each group, making it easier to manage. If you group a large volume of data on the basis of one or more characteristics of the data, the resulting data set is generally much smaller than the original and is therefore easier to analyze using standard tools.

It is important to consider whether you should use Sort stages or additional Aggregate stages in the job as you create the new stage.

To run a job with the Aggregator stage correctly, make sure each input column is mapped to an output column of the correct type. Input columns with a Nullable value of Yes or No should be mapped to output columns with the same value.

Watch the following video for an example of how to work with the DataStage® Aggregator stage.

This video provides a visual method to learn the concepts and tasks in this documentation.