Simple Worked Example of Creating and Applying a Sampling Weight – The Data Story Guide

Consider the following data showing 10 people’s favorite celebrity:

Brad Pitt
Brad Pitt
Brad Pitt
Brad Pitt
Brad Pitt
Brad Pitt
Brad Pitt
Brad Pitt
Tiger Woods
Tiger Woods

In this sample of ten, 80% of people have nominated Brad Pitt as their favorite celebrity. When we conduct surveys, we want to be able to draw conclusions about the entire population, rather than just our sample.

So instead of saying “80% of the sample nominate Brad Pitt as their favorite celebrity” we should aim to be conducting research that allows us to confidently state “Brad Pitt is the most popular celebrity with 80% of people nominating him as their favorite.”

Now consider the impact of some additional information about the gender of our ten respondents:

Brad Pitt                       Female
Brad Pitt                       Female
Brad Pitt                       Female
Brad Pitt                       Female
Brad Pitt                       Female
Brad Pitt                       Female
Brad Pitt                       Female
Brad Pitt                       Female
Tiger Woods                 Male
Tiger Woods                 Male

From this data, we can see that the sample is unrepresentative in terms of gender, with eight of ten respondents being female (80%) and two being male (20%). A big difference when we consider that true representation in the world is about 50/50.

Furthermore, gender seems to be the sole determinant of preference. As the sample is not representative in terms of gender, and gender is correlated with our measure of favorite celebrity, it seems that any estimate of people’s favorite celebrity will only be valid if we consider the over-representation of women in the sample.

We can improve this estimate by weighting. A weight is computed for every respondent in a sample by dividing the correct proportion by the observed proportion. The correct proportion of males in our population is 50% and the observed proportion is 20%, so the weight for each male is 50%/20%=2.5 and the weight for each female is 50%/80%=0.625. Thus, our data becomes:

Favorite

Celebrity Weight

Brad Pitt           0.625
Brad Pitt           0.625
Brad Pitt           0.625
Brad Pitt           0.625
Brad Pitt           0.625
Brad Pitt           0.625
Brad Pitt           0.625
Brad Pitt           0.625
Tiger Woods     2.5
Tiger Woods     2.5

Now we can compute our estimate of the proportion of people to list Brad Pitt as their favorite celebrity by summing up the weights of each of the respondents to prefer Brad Pitt and dividing this by the sum of all respondents’ weights:

The approach described here for computing a weight is a relatively simple case, but the basic idea can be extended to deal with much more complicated cases.

In this second example, our weighting is a sampling weight, and its goal is to correct for over- and under-representation of critical groups in a study. With volumetric weights, we know the weights as they come from other data that we know is pertinent. With sampling weights, we have to create weights such that when we perform weighted calculations, the survey results align with known facts about the market.

Sampling weights are used to bring results into line with some known characteristics of the population. For example, if a sample contains 40% males and the population includes 49% males, weighting can adjust for this discrepancy.