Rebasing Data – The Data Story Guide

Rebasing involves changing the size of the sample used in a calculation. Rebasing can be done by simple math, filtering, or recoding.

Worked examples of rebasing

Example 1: Attitudes

Consider the table below. Five people have said DON’T KNOW. It is reasonable for a person to say they don’t know something. But, when the number who have said it is small, as in this case, it is not interesting data. Consequently, it is traditionally regarded as being “dirt” and the fix is to replace it with a missing value.

A missing value is a special value that is typically understood by analysis software as being an instruction to automatically filter the table and recompute the values with this data excluded. The table here shows the summary of the same data as above, but the DON’T KNOW value has been set as a missing value (so it disappears from the table). Note that the actual results have changed (e.g., Disagree a little is now 13% rather than 12%). The act of recoding data as missing to change the computed results is known as rebasing.

Example 2: Political polling

Consider the result of a survey that asked people which party they were likely to next vote for, with the answers recorded being:

33% Republican
39% Democrat
28% Don't know

This data on its own is not so useful if wanting to predict the outcomes in an election, as Don't know is not an option in an election.

The standard solution is to rebase the data. That is, the base used in the percentages above is the total sample. A better base is the proportion of people to have chosen either Republican or Democrat.

In this example, the resulting data then becomes:

Republican 33% / (33% + 39%) = 46%
Democrat 54%

When we rebase data we are making an assumption, and the assumption may be incorrect. For example, if some people were planning to vote Republican, but were embarrassed to admit this and said Don't Know instead, then assuming that people with missing data are the same as those without missing data is clearly wrong (as the people with missing data are more likely to vote Republican than are people without missing data).

Where it is unlikely that people with missing data are the same as people without missing data, it becomes extremely difficult to perform valid analyses. For this reason, while introductory books on market research often say things like always allow people to say don't know and let people choose whether they answer a question or not, experienced researchers tend to make all questions compulsory and only offer Don't know options when they are sure that the resulting data will be useful.

Example 3: Brand fit

Surveys are conducted to give insight into populations. In the example below, we have data on the extent to which a new product is seen as being consistent with the Apple brand. In the table below on the left, it could be summarized as saying "31% of people believe that iLock fits with the Apple brand." Such a summary would be misleading, as people would incorrectly draw the inference that most people - 100% - 31% = 69% - do not believe that iLock fits with the brand when the truth is that only 16% think this.

The Missing data and Don't know categories are uninformative to the question of the brand fit of the iLock. The standard fix in such a situation is to rebase the analysis, and instead show the table at the right, which leads to a better summary of the data, being that 65% of people believe that the iLock fits with the Apple brand.

Example 4: Rebasing purchase intent

Consider a survey conducted among iPhone buyers which finds that 30% of respondents say they will upgrade to a new model in the next 12 months. If we know that 40% of people have an iPhone, we can then calculate the proportion of the population who will upgrade an iPhone in the total population.

In this case, we rebase by multiplying our two numbers, which tells us that 30% * 40% = 12% of the total population will upgrade an iPhone.

The logic here is exactly the same as above, but the calculations have been done in reverse.

How to rebase

There are three ways of rebasing:

Simple math. This is illustrated in the examples above.
Filtering. This approach involves filtering any analyses to only include the relevant sample.
Recoding. This approach involves recoding any data that you wish to exclude from the base as missing values. Where the rebasing is rectifying a data integrity problem, this approach is best as it locks in the correction of the data. Modern software, such as Displayr and Q, allows you to rebase by right-clicking on categories in tables (Remove in Q and Delete in Displayr).