A key part of data tidying is to merge together categories. Common approaches to merging categories are:
- Combining the smallest categories together.
- Combining adjacent categories.
- Applying standard merges (e.g., top 2 boxes).
- Combining categories that are similar with respect to other data.
The way that data is merged varies by software.
Combining the smallest categories together
The table below on the left shows providers of different cell phones. The table on the right shows the smaller providers merged into an Other category.
Sometimes such merging of small categories is performed automatically, with brands of less than, say, 5% or 2% merged together.
Combining adjacent categories
Respondents to a survey were shown the description above and asked to rate how well it fits with the Apple brand. The results are shown below on the left. There are so many small categories that it makes the data a bit overwhelming. A simple solution to this is to merge adjacent categories together, as shown on the right.
Applying standard merges (e.g., top 2 boxes)
In many fields, there are standard ways of combining categories of categorical variables. For example, in the United Kingdom, statisticians combine occupations into the following socio-economic grades.
|C2||Skilled manual occupations|
|DE||Semi-skilled & unskilled manual occupations
Unemployed and lowest grade occupations
In customer feedback, it is routine to ask how likely people are to recommend a product on an 11-point scale and merge them into the following three categories.
|Social Grade||On a scale of 0 to 10, how likely are you to recommend INSERT NAME OF BRAND to your friends and colleagues?|
0 Extremely unlikely
10 Extremely likely
Most commonly, it's routine to merge together 5-point scales into two categories, where the top two categories are called the top 2 box score. Similarly, there with 7-point scales, there are top 3 boxes, some people also like to analyze bottom 2 boxes, etc.
Combining categories that are similar with respect to other data
Data can also be merged so that it best shows how the data relates to some other data (e.g., using techniques like CHAID).
How merging varies by software
In some software, such as SPSS Statistics and R, categories are merged by recoding the values in variables. For example, if you want to merge together categories 0, 1, 2, 3, 4, and 5, this is done by recoding all the values (e.g., replacing values of 1, 2, 3, 4, and 5 with a value of 0).
Other software, such as Q or Displayr, permits categories to be merged without the need for the data to be recoded. This means that categories can be merged without numeric summaries (e.g., means) being changed by the merging of categories.