Reverse Coding – The Data Story Guide

Reverse coding involves reversing the order of the values associated with categories in a variable. It is used to simplify data interpretation.

The values that are assigned to categories in variables are often determined automatically, based on the order with which options are presented. Consider a variable storing how much people like a politician, with values and labels of:

Love
Like
Neither like nor dislike
Dislike
Hate

The lower the average of such a variable, the more the politician is liked. It is easy for analyses from such data to be inadvertently misinterpreted. A simple fix is to recode the values as follows:

Hate
Dislike
Neither like nor dislike
Like
Love

Worked example

The table below shows how likely people said they would be to buy the iLock. The first row shows the answers when people were not given any pricing information. The second row shows the reactions when it was priced. Looking at the first column of the table, we can see that there is no difference in the top box scores of 13%. The category I would probably buy it score is higher for the unpriced option, but the difference is not statistically significant. However, if you look at the other categories, it is clear that the overall sentiment is lower for the priced option.

Having to show the percentages of people choosing any of the options is a fairly verbose summary of the data. It is often useful to calculate the average response, rather than the percentage of people who chose each category. The Average column shows the average purchase intent for No price and $199. The way that the average has been computed is that in the raw data, I would definitely buy it has been set to a 1, I would probably buy it to 2, I am not sure to 3, probably not to 4, and I would definitely not buy it to 5.

The average for No price is lower than it is for $199. This means that on average people are more likely to have said they would buy the unpriced product than the priced one because the categories that represent higher purchase intention have been assigned lower values. It is counterintuitive that a low value represents more purchase intent. Consequently, it is common to reverse code the data, changing the values of the data so that categories that equate to higher purchase intent are given higher values. The reverse scale values are shown in the third column in the table below.

Purchase intent category	Original coding	Reverse coding
I would definitely buy it	1	5
I would probably buy it	2	4
I am not sure where I would buy it or not	3	3
I would probably not buy it	4	2
I would definitely not buy it	5	1

The table below contains the averages after the data has been reverse coded. It is now easier to correctly interpret, with a higher average corresponding to higher purchase intent.