Statistical inference, including statistical testing, with sampling weights, is different from traditional statistical analysis, as:
- The default settings in most data analysis software are not designed for sampling weights
- There are correct approaches to stat testing with weighted samples
- If you use the wrong method you get the wrong result
- Hacks have been created, but none work well
The default settings in most data analysis software are not designed for sampling weights
It has long been known that the default settings in most data analysis software are not designed for sampling weights. It is clearly stated in the documentation for all the main statistical packages. For example, SPSS Statistics help states that:
The WEIGHT command in the SPSS Base is a frequency or replication weight. Even though it allows the specification of noninteger values, such values are not treated as sampling weights. In order to properly analyze data from complex samples, you need to use the SPSS Complex Samples module.
There are correct approaches to stat testing with weighted samples
Where there are sampling weights, there are correct formulas for most standard problems. For more information, see Introduction to Variance Estimation, 2007, Kirk Wolter.
As a simple example, consider this small data set of 10 cases. We only have one male and nine females, so weighting can be used to correct this imbalance. A sampling weight, which sums to 100 has been computed.
The summary table of the approval data from above is shown below, where:
- The weighted proportion of people approving (Yes) is 27.78%.
- The standard error is 0.18. (The standard error is the standard measure used for quantifying sampling error.)
- The confidence interval for approval is -6.97% o 62.53% (i.e., approximately four times the standard error.
Provided that the correct settings are chosen, you get these results whether using SPSS Complex Samples, R with the survey package, Displayr, or Q.
If you use the wrong method you get the wrong result
SPSS Statistics does not have a menu-based approach for calculating the standard error of the proportion, but the standard error of the mean is almost identical, so it is used instead (the difference between the two is discussed later in this article). The weighted analysis performed with the default settings in SPSS concludes that:
- Approval is 27.78%. This is the same as the correct result.
- The standard error of the mean is calculated at 0.045 (4.5%). This is labeled as Std. Error Mean in the first table of the output below. This is approximately 1/4 of the correct result. That is, with this example, using the default settings in SPSS gives a massively wrong result, understating the true uncertainty by a factor of four.
Hacks have been created, but none work well
Hacks have been developed to trick traditional algorithms to give better results than when left at their default settings. None of the hacks is strong - all give the wrong answers. But, they are all preferable to using the default treatment of sampling weights. In declining order of quality, the most common hacks are:
- Weight calibration
- Modifying the weight to have an average weight of 1
- Using the observed sample size
- Use the weight as is
Resampling creates a new synthetic data set by randomly selecting cases, as replacements, from an existing data set. Cases are selected with probability proportional to the weight. That is, a weighted bootstrap is used to create the data set.
This approach is always inferior to variance estimation due to:
- The noise added by random selection.
- It ignores correlations between the weight and the variables in the analysis.
- It is easy to get make mistakes when using the approach.
Weight calibration involves modifying the weight, to sum up to the effective sample size. The resulting weight is sometimes known as the calibrated weight. This approach is used in Survey Reporter, Quantum, and many traditional crosstab packages.
In the example above, the effective sample size is 3.6. Intuitively, this makes sense, in that:
- As there is only 1 man in the sample, clearly the effective sample size has to be less than the actual sample size of 10.
- As there are 9 women in the sample, clearly the effective sample size must be more than 2 (i.e., we clearly gain some information by having so many women in the sample).
Thus, if we employ weight calibration, we end up with a weight of 1.8 for the man, and .2 for each woman.
The table below shows SPSS's standard error calculation with weight calibration. As discussed above, the correct standard error for this problem is 0.18. However, weight calibration is showing a much higher value of .28.
Modifying the weight to have an average weight of 1
With this approach, the weight is modified so that it has an average of 1. Equivalently, to force the weights to sum up to the actual sample size, which in this example is 10.
The SPSS output is shown below. It is also wrong, showing a smaller standard error than is correct. In this specific example, it is closer to the correct value, but this is just a fluke (if forced to choose, calibrating to the effective sample size is likely superior)
Using the actual sample size in formulas (e.g., using Excel)
This approach proceeds by using the standard formulas taught in introductory statistics texts, where the weighted statistics are used in the formulas (e.g., mean, proportion, standard deviation), except for the sample size, which is used unweighted. This approach is traditionally implemented using Excel spreadsheets.
From the above SPSS output, we know the weighted standard deviation and the sample size, so can use the standard formula:
s / √ (n) = .47213 / √ (10) = 0.14930.
Note that this is just the same as the result in the previous section (and wrong).
Similarly, if we compute the standard error for the proportion, we have the same problem:
√ [p (1 - p) / n)] = √ [.2778 (1 - .2778) / 10)] = 0.14164
Use the weight as is
The final approach is to use the weight “as is”, using the default settings. This is clearly much worse than any of the hacks.