How to Improve a Weight – The Data Story Guide

There are a number of strategies for improving a weight:

Choosing better adjustment variables
Merging categories of adjustment variables
Using rims instead of cells
Trimming (bounds on weights)
Advanced methods

When evaluating alternative ways of improving a weight, consideration needs to be given to the bias versus error tradeoff.

Choosing better adjustment variables

See Adjustment Variables and Targets for Weighting

Merging categories of adjustment variables

Often small effective sample sizes are caused by categories in the categorical adjustment variables having small sample sizes, which makes the resulting weights more affected by sampling error. Merging categories can fix this.

Using rims instead of cells

That is, using separate adjustment variables rather than a composite adjustment variable.

In Checking Sampling Weights, we examined a case study that had a single categorical adjustment variable with 32 categories, representing all the combinations of age, gender, and region.

Rim weighting can be conducted by using each of age, gender, and region as separate adjustment variables. This involves only having 10 targets. If applying rim weighting to the case study in Checking Sampling Weights, the resulting weight is grossly inferior to the original weight. In particular:

The effective sample size from the resulting weight is 1,613 (83%), lower than the one from the cell weights, meaning that the sampling error will be higher.
The effective sample size computed for the key analysis of interest (attitude to gay marriage) is 1,451, which is also lower.
This weight makes no attempt to control for differences in age and gender within the region, the resulting estimates involved from using this weight will be biased.

Trimming (bounds on weights)

Trimming a weight involves specifying the smallest (lower) and highest (upper) values that a weight may take.

There are three main approaches to trimming:

Simple trimming: replacing values below some specified value with the specified lower bound and values above the upper value with the upper bound.
Constrained calibration: algorithms that permit specification of targets and constraints regarding lower and upper bounds.
Repeated raking and trimming: computing weights using raking or calibration, trimming them, re-raking/calibrating, etc., until the weights stabilize.

Simple trimming is rarely the best solution. The resulting weight’s average is typically changed from 1. And, the weighted tables of adjustment variables will not exactly match the targets. The other two approaches can be hit and miss.

A few comments about the effects of setting upper and lower bounds for weights via trimming are as follows:

When setting lower and upper values when trimming it is important to remember that they are related. If the maximum is 5 then the minimum should be around 0.2 (i.e., lower = 1/upper).
Sometimes reducing the upper bound can have a bigger impact than increasing the lower and vice versa.
The narrower the permitted range, the higher the effective sample size computed using Kish’s formula, and typically the higher the effective sample size using other formulas. This is because the more trimming, the closer we come to the assumption of having no weight at all.
When cell weighting is used, the effect of trimming is always to increase the magnitude of the difference between the targets and the weighted results. This is not always the case with calibration and raking.

Advanced methods

There are more advanced methods for improving weights. In particular, using various nonlinear and Bayesian models with Propensity Weights (Propensity Score Adjustment).

It is extremely rare that such advanced methods are used in commercial studies.

The bias versus error tradeoff

The goal when improving a weight is to reduce the likely noise in the analyses (i.e., reduce the size of sampling error). This is most readily assessed using the effective sample size.

However, the weights with lower effective sample sizes will likely be less biased. This can be assessed by comparing the effect of different weights on key results of interest.

Unfortunately, whatever has the biggest effect in terms of increasing the effective sample size of the weight may also have the biggest impact in terms of increasing bias, so care needs to be taken.