Deciding Whether or Not to Weight a Sample – The Data Story Guide

We should weight a survey when:

There are discrepancies between survey results and facts. For example:
- - Fact: 12% of US adults live in California
  - Result: 25% of respondents live in California
The discrepancy is believed to be caused by the survey having interviewed too few or too many people in one or more groups in the population. For example, if we have more people saying they are Californian because we interviewed too many. The main alternative to having interviewed too few or too many people is measurement error (e.g., people living in Texas but saying they live in California for some reason).
If the discrepancies are not addressed, the key results of the study will be wrong. For example, suppose our study is focused on understanding attitudes to the environment, and we both have too many Californians and know that Californians differ from the average Americans in terms of their attitudes to the environment. In that case, we know we will have a problem unless we correct it via weighting.

Ideally, we should have all three of these conditions in place if weighting data. However, it is commonplace for people to weight when none of these are in place. There are various reasons for this:

Having weighted data is often seen by less technical clients as being a sign of data hygiene.
It can take time and skill to verify the third condition.
The only common consequence of weighting unnecessarily is that fewer results will be statistically significant than otherwise.

However, inappropriate weighting will reduce the chance of obtaining insights from data, just due to results being less likely to be significant. Consequently, the following rule of thumb can be useful:

Always weight longitudinal/tracking studies. This is because with such studies we are focused on looking at change over time, we are guaranteed that sampling will cause change over time, so we reduce the effect of this by weighting.
Only weight other studies when you know you must weight due to an important discrepancy in the data.