Confidence Interval – The Data Story Guide

A range of values, calculated from the sample observations, that are believed, with a particular probability, to contain the true parameter values. A 95% confidence interval, for example, implies that was the estimation process repeated again and again, then 95% of the calculated intervals could be expected to contain the true parameter value. Note that the stated probability level refers to the properties of the interval and not to the parameter itself which is not considered a random variable (B. S. Everitt (2002): The Cambridge Dictionary of Statistics, Second Edition, Cambridge.)

Computation

The standard formula for computing a confidence interval takes as inputs:

The statistic that has been estimated in the sample.
The standard error of that statistic.
The desired confidence level. Typically a 95% confidence level is used.
An assumption about the distribution of the statistic (e.g., that it is normal).

The assumption of normality is almost always made and it is often a sensible assumption in survey research due to the various generalizations of the Central Limits Theorem. The problematic aspect of the calculation of the confidence intervals relates to the computation of the standard error, where many researchers routinely apply incorrect formulas (in particular, it is commonplace to assume a simple random sample even in situations where this is unambiguously not the case (such as when weighting has been applied).

Example

If the average in a sample is 5.9, the standard error is 0.189763619, a 95% level of confidence is sought, then the 95% confidence interval is:

95\%CI=5.9\pm 1.96\times 0.189763619=5.9\pm 0.3=[5.6,6.2]

where 1.96 is computed from the normal distribution (e.g., in Excel, where 0.95 represents the desired confidence level, the formula is =NORMSINV(1-(1-0.95)/2)).

Also known as:

Confidence Band
Confidence Bounds
Confidence Limits