Probability Coding Intentions Data – The Data Story Guide

Surveys often collect data on how likely people are to do something (i.e., their intentions). Such data can be analyzed by converting it to numeric, choosing appropriate values, and calculating averages. There are pros and cons to this approach.

Worked example

The table below shows purchase intention for a product, with the average shown in the final column. As discussed in Reverse Coding, the average is difficult to interpret because higher purchase intent is assigned lower values (see the Original values column in the table below).

One way of improving the interpretation of the average is to replace the original values with a probability coding, which reflects the probability that people in each of the shown purchase intent categories will buy. Such probabilities can be calculated by using historic data (e.g., looking at the percentage of people that said I would definitely buy it who went on to buy) or judgment.

Purchase intent category	Original values	Probability coding
I would definitely buy it	1	100
I would probably buy it	2	75
I am not sure where I would buy it or not	3	20
I would probably not buy it	4	0
I would definitely not buy it	5	0

The resulting average purchase intent then becomes:

Pros and cons of probability coding intentions data

A disadvantage of this probability coding is that where judgment is used, it is very subjective. Why is I may or may not buy 20%? It is highly subjective. However, on the plus side:

The choice of values used in recoding tends not to make a large difference to the relativities, and, in general with survey data analysis, it is the relativities that we should focus on (see The Delta Principle of Data Analysis)
It allows us to compute an average for a variable that is not, originally, numeric. This is a more efficient summary of the data, making it easier to see patterns.
We are more likely to see significant differences when comparing the average of a variable than when comparing percentages (because the analysis uses more data). This is certainly the case with this example, with the averages we see that the unpriced product has lower purchase intent.
We can more readily communicate the meaning in the data. The probability recoding allows us to say that after showing the price, the stated probability of buying drops from 32% to 24%.

Worked example

Purchase intent category

Original values

Probability coding

Pros and cons of probability coding intentions data