Aggregating data involves summarizing data into a smaller quantity of data. It can be used to create new data sets. One or more variables are used to define how the aggregation is performed.
Aggregating involves summarizing data into a smaller quantity of data
Aggregation is just another name for tabulation. It is exactly the same thing as is done by summary tables and crosstabs.
The table on the right replaces the five cases on the left, with two cases, where the first case contains summary information of the three cases with Zip Code of 2000 in the left table, and the second case summarizes the two cases with Zip Code of 2001.
Aggregation can be used used to create new data sets
The results of aggregation can be used to create new data sets. For example, many data publically available data sets have been created by aggregation (e.g., data files containing statistics by states within a country).
The types and properties of the data are fundamentally altered when aggregation occurs. For example, the original file may contain categorical variables, whereas the aggregated file will often need to create numeric summaries of these variables (e.g., a categorical variable indicating age category in the original file may be replaced by multiple variables in the new file, each representing the percentage of people in each age category.
The variable or variables used to perform the aggregation
In the example above, the data was aggregated by Zip Code. Any variable can be used to perform aggregation, but common ones are:
- Time. For example, converting daily data into annual data.
- Geography. For example, converting people into states.
- Product category. For example, converting SKUs into brands.
Please sign in to leave a comment.