Effective Sample Size – The Data Story Guide

The effective sample size is used as a way of quantifying the effect of weighting a survey. It is often approximated by the Kish formula, but this formula is overly simplistic.

Definition

The effective sample size is an estimate of the sample size required to achieve the same level of precision as would be expected to be obtained by a simple random sample.

For example, if a survey of 1,000 people has an effective sample size for a statistic of 500, it means that the amount of sampling error is equivalent to that which would have been obtained by a study of 500 people that did not need to be weighted.

The Kish formula

The following formula, usually attributed to famous statistician Leslie Kish, computes an estimate of the effective sample size as:

\[\begin{align} = \frac{(\sum^n_{i=1} w_i)^2}{\sum^n_{i=1}w^2_i}\end{align}\]

where \(w_i\) is the weight of the \(i\)th respondent.

For example, if we have a sample of 10 people, where one of them has a weight of 5, and the other nine have a weight of 5/9, the effective sample size becomes 3.6:

The Kish formula is popular because it is simple. However, it is not correct, in a couple of different ways:

The formula implies that a sample can have an effective sample size. However, in reality, every single result in a survey has an effective sample size, and they often differ.
The formula takes only the weights into account. Calculating the correct effective sample size also requires an understanding of the correlation between the weight and the data that is being weighted.

Consider the situation where our weights above correspond to gender, and the weight of 5 is for one male, and the other nine cases are women:

If we filter the data to include only males, what does the effective sample size become? The answer is 1. This answer both comes from common sense and can be obtained by using the formula above.
If we did an analysis of only women, the effective sample size for that analysis would be 9 (i.e., higher than the 3.6 for 10 people). This is because the purpose of the weight is to correct for over-or under-representation in a group, but this group only contains women so there is nothing to correct.
If we wanted to do an analysis comparing men and women, the sample size would be 10, as the weight has no effect.

A better approach to calculating the effective sample size

A better approach to computing the effective sample size is to:

Identify all the key analyses of interest.
Compute the effective sample size for each, using Taylor Series Linearization or some other variance estimation method.
If necessary, present the median or some other summary of the effective sample sizes.

The table below, on the left, shows the effective sample size computed for the key question of interest in the article on checking a weight. The value of 1,837 shown for the NET corresponds to the initial Kish calculation. The more relevant numbers are those for Yes, No, and Don’t Know, which are all roughly the same. The effective sample size shown in the footer is a summary of the values on the table.

As the Don’t Know response tends to be removed when computing polling results, the more accurate effective sample size in this example is the one shown on the table to the right. The 1,672 shown in each cell and in the footer is the more accurate number, as it takes into account the correlation between the weight and the data in the table, whereas the Kish number ignores this.