Data should be shaped like a rectangle. This greatly simplifies data analysis. There are a variety of common ways that data is misshapen.
The desired rectangular shape
As discussed in Raw Data, efficient data analysis requires that the raw data has a rectangular shape, with rows representing cases and columns representing variables. This article reviews the most common ways that data can be misshapen, making data analysis problematic.
The rectangular shape greatly simplifies data analysis
Data analysis software is a collection of algorithms for analyzing data. These algorithms assume that the data is in a specific format. Where the data is not in the shape assumed by the algorithms, it means that either:
- The desired calculations cannot be performed until the data is placed into the correct format.
- Specialist algorithms need to be found or created that can perform the calculations with the data.
- Manual calculations are required to glue together results.
- The wrong results are calculated.
Common ways in which data is misshapen
Some common mistakes when setting up data files are:
- Messy Rectangles, where there are gaps in the rectangle.
- Too Wide Data File, where the rectangle is wider and shorter than it should be.
- Multiple Variables in a Single Column, where the rectangle is taller and narrower than it should be.
- Multiple Tables, Rather Than a Single Table