There is an optimal shape of a data set. When the data is not in this shape, it is often appropriate to reshape it. The main ways of doing this are to:
- Stacking
- Widening
- Aggregating
- Merging
- Splitting
The optimal shape for data
The optimal shape of a data file is a neat rectangle, where each row is a case and each column a variable. For more information, see The Desired Shape of Data: The Rectangle.
Stacking
Stacking a data set replaces one rectangular data set with another, where the new data set has more rows and fewer columns. This is done by creating variables (columns) in the new data set that contain multiple columns from the original data set, stacked on top of each other. For more information, see Stacking.
Widening
Widening a data file is the exact opposite of stacking. See the previous section.
Merging
Two or more data sets can be merged (joined) to create a new data set. See Merging.
Splitting
Data can be split into multiple, smaller data sets. For example, a file containing data for 10 years, can be split into smaller annual files.
Aggregating
Aggregating involves creating a new file, which contains summary information from the original file. For example, replacing a file that has the age of 1,000 people, with a file showing the average age in each state. See Aggregating Data.
Comments
0 comments
Please sign in to leave a comment.