Propensity Weights (Propensity Score Adjustment) – The Data Story Guide

It is possible to use statistical and machine learning techniques to create weights and/or composite categorical adjustment variables.

This approach is useful in situations where there are many potential adjustment variables (which is rarely the case in practice). The basic process is:

Obtain or create a data file that contains all the categorical adjustment variables and where the data is considered to be representative of the population of interest (e.g., the proportion of 18 to 24-year-old men living in Hawaii in this data set should match the proportion in the population). This is called the reference data set.
Merge the reference data set with the survey data set so that they are stacked on top of each other, with each of the adjustment variables merged together and a single extra variable denoting whether the observation in the combined data set is from the survey data set or the reference data set.
Estimate a model using the merged data set, predicting which observations are in the survey data set using the adjustment variables as predictors. Propensity weights can then be computed in a number of ways, including:
- As the inverse of the predicted probability of an observation being in the survey data set.
- Computing Cell weights, where the categorical adjustment variable is created be either:
  - Grouping respondents according to quintiles of their predicted values in the data set of interest (i.e., the 20% of respondents with the highest predicted probability, next 20%, etc.). Logistic regression and random forest can be used for such a model. A particular benefit of this approach is that statistical tests can be used to work out which variables to include and to test for interactions.
  - Using the nodes created by tree-based methods, such as CART and CHAID.

There are further variants of this approach. For more information, please see these sources and the references within:

Sunghee Lee and Richard Valliant (2009), Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment, Sociological Methods, 37(3), 319-343.
The Pew Research Center (2018), For Weighting Online Opt-In Samples, What Matters Most?