It is possible to use statistical and machine learning techniques to create weights and/or composite categorical adjustment variables.
This approach is useful in situations where there are many potential adjustment variables (which is rarely the case in practice). The basic process is:
- Obtain or create a data file that contains all the categorical adjustment variables and where the data is considered to be representative of the population of interest (e.g., the proportion of 18 to 24-year-old men living in Hawaii in this data set should match the proportion in the population). This is called the reference data set.
- Merge the reference data set with the survey data set so that they are stacked on top of each other, with each of the adjustment variables merged together and a single extra variable denoting whether the observation in the combined data set is from the survey data set or the reference data set.
- Estimate a model using the merged data set, predicting which observations are in the survey data set using the adjustment variables as predictors. Propensity weights can then be computed in a number of ways, including:
- As the inverse of the predicted probability of an observation being in the survey data set.
- Computing Cell weights, where the categorical adjustment variable is created be either:
- Grouping respondents according to quintiles of their predicted values in the data set of interest (i.e., the 20% of respondents with the highest predicted probability, next 20%, etc.). Logistic regression and random forest can be used for such a model. A particular benefit of this approach is that statistical tests can be used to work out which variables to include and to test for interactions.
- Using the nodes created by tree-based methods, such as CART and CHAID.
There are further variants of this approach. For more information, please see these sources and the references within:
- Sunghee Lee and Richard Valliant (2009), Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment, Sociological Methods, 37(3), 319-343.
- The Pew Research Center (2018), For Weighting Online Opt-In Samples, What Matters Most?
Please sign in to leave a comment.