How to Select Adjustment Variables for Weighting – The Data Story Guide

The following considerations are relevant when selecting adjustment variables:

The chosen adjustment variables should cause key results to change
The adjustment variable(s) should be strongly related to key data collected in the study
Adjustment variables cannot include missing values
Targets for an adjustment variable need to be known and consistent with other targets
Use strata as an adjustment variable if you have non-proportional stratification
Use variables that are known to relate to biases in data collection
Avoid using variables with a high level of measurement error
The category definitions of adjustment variables should be the same in the survey as in the data used to calculate the targets
Don't use highly correlated variables

The chosen adjustment variables should cause key results to change

As discussed in Introduction to Weighting, the criteria for weighting are:

There are discrepancies between survey results and facts. For example:
The discrepancy is believed to be caused by the survey having interviewed too few or too many people in one or more groups in the population.
If the discrepancies are not addressed, the key results of the study will be wrong.

A good adjustment variable is then a variable that quantifies such a discrepancy.

The adjustment variable(s) should be strongly related to key data collected in the study

The stronger the relationship, the more appropriate the adjustment variable. For example, if you are conducting a study looking at attitudes to social policies, voting history and education become appropriate adjustment variables. In contrast, they are likely less relevant in a study examining preferences for brands of tomato sauce.

Note that if this criterion is not met, then it's impossible for the previous criterion to be met.

Adjustment variables cannot include missing values

A categorical adjustment variable must contain mutually exclusive and exhaustive categories. if the variable that is desired to be used does contain missing values, the options are to either:

Delete any cases with missing values from the data set.
Add the missing values as a category and work out an appropriate target.
Impute the missing values.

Targets for an adjustment variable need to be known and consistent with other targets

For example, if you do not know market share, then a brand’s market share cannot be used as an adjustment variable. For this reason, it is most common to weight surveys based on age, gender, and geography, as high-quality data is available on these in most countries.

Refer to Common Mistakes when Creating Weights for more about this point.

Use strata as an adjustment variable if you have non-proportional stratification

For example, if the data was collected using a booster sample of people in a group (e.g., buyers of the client’s brands), then the variable used to perform this stratification needs to be used as an adjustment variable.

Use variables that are known to relate to biases in data collection

For example, household size and structure are often related to how people are selected for studies (e.g., if conducting a study by phone, people in one-person households will typically be over-represented, as the bigger a household, the less chance any one person in that household will respond).

Avoid using variables with a high level of measurement error

Consider the situation where a survey finds that the average household income is $92,000, but the Census says the correct figure is $62,000. It is possible that the survey over-represents high earners. But, it is more likely that there is a lot of measurement error, and people have told the government a different income than they indicate in the survey.

Weighting cannot fix measurement errors. Adjustment variables need to be variables with low levels or no measurement error.

The category definitions of adjustment variables should be the same in the survey as in the data used to calculate the targets

Sometimes there is not a neat relationship between the categories used in a survey and those with reliable facts. For example:

At the time of writing, the US Census collects sex as male and female, whereas Facebook has 58 gender categories (e.g., agender, androgyne, androgynous, bigender).
It is common to ask people who they voted for and give them an option of I’d prefer not to say, whereas, in an election, such an option does not appear.

The simplest solution is to exclude such data from the weighting process. However, this is not a practical solution if the potential adjustment variable is correlated with key results. Solutions are:

If the survey has extra categories:
- Use judgment to group people into the most appropriate target categories.
- Treat them as missing values and impute the missing values.
- Add the extra categories as targets, and assume that their size is the same as observed in the survey.
If the target has extra categories, use judgment to apportion them in the survey question’s categories.

Don't use highly correlated variables

It's inadvisable to use highly correlated variables as target variables. For example, personal income and household income.