Cluster Sampling

Oliver Harrison

December 02, 2023 22:22
Updated

A population is divided into mutually exclusive and non-exhaustive groups which are called clusters. Probability sampling is then used to select a sample of these clusters. Probability sampling is then conducted within each of the selected clusters. Typically, the purpose of cluster sampling is to reduce the costs of data collection and this is achieved by defining clusters according to ease-of-access (e.g., a suburb may be a cluster if doing door-to-door sampling or a household may be a cluster if doing phone interviewing).

A consequence of cluster sampling is that it generally increases the standard errors of quantities estimated (i.e., reduces the effective sample size; the variation of the quantity by cluster the greater the standard error relative to that obtained by simple random sampling. The vast majority of Tests of Statistical Significance assume that data is not collected by cluster sampling and give misleading conclusions when this assumption is not satisfied. Most modern analyses of cluster samples involve the use of mixture models.