How to Weight a Survey – The Data Story Guide

This article provides an overview of the process for creating sampling weights. If you are new to weighting, please first read Simple Worked Example of Creating and Applying a Sampling Weight.

The article follows the steps in the process of weighting a survey:

Decide if it is necessary to weight the sample
Selecting adjustment variables and targets
Calculating the initial weight variable
Refining the weight
Applying the weight

Decide if it is necessary to weight the sample

Only weight a survey if there's good reason to believe that it will improve the quality of conclusions obtained from the study. In the case of longitudinal/tracking studies, it is almost always appropriate to weight studies. For one-off studies, weighting needs a good reason. See Deciding Whether or Not to Weight a Sample.

Selecting adjustment variables and targets

In the example that looked at the popularity of Brad Pitt and Tiger Woods, the data was weighted by comparing the gender of people in the sample with that of people in the population. Gender is known as the adjustment variable, and the values that the data was weighted to match are the targets.

The selection of adjustment variables and targets is most of the work in creating sampling weights. For more information, see:

Calculating the initial weight variable

The outcome of the process of creating weights is ideally a single variable called the weight, weights, weight variable, or weighting variable. Each observation in the data set should be assigned a value for this weight variable. An example of such a variable was provided in the second column in the worked example at the beginning of this chapter.

Typically, weight variables are created to have an average value of 1.0 and to have a minimum value of not less than 0, where a value of 0 indicates that the observations with this value are automatically excluded from any analyses performed using the weights.

Some weighting software can produce negative weights, but their use is not sensible in real-world applications.

If using software like Q or Displayr, this is done automatically. If using R or SPSS, you will need to manually choose the algorithm used to create the weight:

If we have a single compound adjustment variable (e.g., age-by-gender-by-geography), we should calculate Cell Weights.
If we have multiple categorical variables (e.g., age, gender, and geography, each as a separate variable), we should calculate Rim Weights (Raking) or, if we have a lot of variables, perhaps Propensity Weights (Propensity Score Adjustment).
If we have a numeric variable, or, need to put constraints on the weight, we should use Calibration Weights.

Refining the weight

The initial weight needs to be checked. See Checking Sampling Weights.

If issues are identified, it needs to be refined. See How to Improve a Weight.

Applying the weight

Once a weight has been created, it needs to be applied. This is usually done by just turning it on. In some software, such as SPSS, this is a global setting. In other software, such as R, Q, and SPSS, it is turned on or off for specific calculations.

Please note that some software, most notably R and SPSS Statistics, by default, assume that any weights are not sampling weights, and will produce the wrong result unless the correct options are used. See Calculating Statistical Significance With Sampling Weights for more information.