Aligning Values With Labels – The Data Story Guide

Sometimes values assigned to categories are not very informative. The values can be replaced with values that better represent the labels, which allows averages and other statistics to be computed. Provided that values are ordered correctly, typically the choice of value has little impact. Values can be optimized to solve specific problems.

Sometimes values assigned to categories are not very informative

The values that are assigned to categories in variables are often determined automatically, with a 1 assigned to the first category. For example, consider the following survey question:

On a scale of 0 to 10, how likely are you to recommend 
your bank to friends and colleagues?

0 Not at all likely
1
2
3
4
5
6
7
8
9
10 Extremely likely

Most data collection software will store this as follows. This can cause two different types of problems:

When the average is computed, it will be misleading and over-state the true average by 1.
If the variable is used as an input to any code, the wrong logic may be used, with people inadvertently assuming that the labels and values match

The simple fix is to recode the values so that the values stored align with the category labels.

Value	Label
1	0 Not at all likely
2	1
3	2
4	3
5	4
6	5
7	6
8	7
9	8
10	9
11	10 Extremely likely

Provided that values are ordered correctly, typically the choice of value has little impact

Sometimes people new to converting labels to values have voiced the concern that the choice of value is subjective, and such subjectivity makes any resulting analysis dubious. This concern is generally unfounded, provided that:

Values are ordered consistently with respect to the labels, it is generally the case that the choice of value has little impact. There are of course exceptions to this, but they tend to occur when people are deliberately trying to cause an exception, as discussed in the next section.
Categories that are intrinsically not ordered, such as Don't know and Refuse to answer are assigned missing values.

The default choice of values is to assign whole numbers, starting at 1. Where the questions have an obvious midpoint (e.g., Strongly disagree, Disagree, Neither, Agree, Strongly Agree), values with a midpoint of 0 can be used (e.g., -2, -1, 0, 1, 2).

Values can be optimized to solve specific problems

A number of approaches have been developed to improve the alignment of labels with values, including: