Recoding a variable involves replacing some values of a variable with other values. Variables are recoded to:
- Correct problems in the data (data cleaning).
- Simplify analysis (data tidying).
Often data is recoded into a new variable. This is also known as transforming data.
Correct problems in the data (data cleaning)
Consider the example shown in Use Histograms to Understand Numeric Variables, where some people in the data are impossibly tall. We can recode the cases with implausible values, giving them a new code. For example, we may:
- View the data in a data editor and delete the data, which is equivalent to changing the values to whatever the missing value code is in the software (e.g., SYSMIS in SPSS, NA in Displayr, and Q).
- Replace the data with our best estimate of their height based on other data. This form of recoding is known as imputation.
Simplify analysis (data tidying)
Consider a variable storing how much people like a politician, with values and labels of:
- Neither like nor dislike
A common way of analyzing this data is to calculate the "top 2 box score", which is the proportion of people that said Like or Love. The easiest way of doing this is to calculate the proportion that said Like and the proportion that said Love, and then sum up these two proportions.
However, often it is necessary to perform such calculations many, many, times, making a more efficient approach advisable. A way of achieving this is to:
- Replace all the values of 1, 2, and 3 with 0 (i.e., recode them).
- Recode the 4s and 5s and 1s.
- Calculate the average of the resulting variable. This average is then the proportion of people that said Like or Love.
If using, for example, SPSS or R, it is routine to recode data in this way. If using Q or Displayr, by contrast, there are much more efficient ways of doing such analysis (see Creating New Variables by Duplicating and Modifying Variable Sets).
Refer to Data Cleaning and Tidying for more common ways of transforming data.
Recoding into a new variable
Recoding can be done by replacing the values in a variable. Or, by creating a new variable with the recoded values in it.