Sample Size for Choice-Based Conjoint – The Data Story Guide

This chapter describes how to work out the sample size for a choice-based conjoint study. It presents a series of rules of thumb. None of these is known to be always valid, so the recommended approach is to use them all and then apply judgment. The approaches are:

The minimum sample size for credibility rule

Minimum subgroup size rules

The Sawtooth heuristic

Standard errors of 0.05 or less

Using simulations and prediction accuracy

Using simulations, scenario planning, and conducting all the intended analyses of interest

The minimum sample size for credibility rule

In commercial market research, there are two magic numbers in widespread use for determining sample size. A sample needs to be at least 300 to be credible, and 1,000 if it is an “important” study. The choice of 300 is not quite as arbitrary as it seems. Its origin is polling research, where a sample size of 300 means that the margin of error is 5.6% (that is, 95% of the time the true result will be within 5.6% of the observed result), which for many real-world problems sounds like a sufficiently small number. In a choice-based conjoint study, the goal is usually to predict preference shares, which makes this rule of thumb applicable for choice-based conjoint.

If you haven’t studied a lot of statistics, you may be perturbed by the lack of rigor in the expression “sounds like a sufficiently small number”. Surely there must be a better way than to go with “sounds like a sufficiently small number?”. There is. The correct solution is to work out the range of possible outcomes and each of their distributions of uncertainty, compute the economic costs of each of these possible, and then multiply these together and compute their sum. The theory is simple. It even has a name: expected utility. In practice, working out the costs is completely impractical in most circumstances, so all that is left is to go with “sounds like a sufficiently small number”.

Minimum subgroup size rules

The next rule is to work out what subgroups need to be compared and ensure that you have a sufficient sample size in each of them. Some people are happy with subgroups as small as 2, others choose a minimum sample size of 25 or 30, some 100, and others suggest 200 (e.g., Brian K. Orme and Keith Chrzan (2017), Becoming an Expert in Conjoint Analysis: Choice Modeling for Pros, Sawtooth Software, Inc.) For example, if there are three key subgroups of interest, and the sample size required for each is 200, then a sample of 600 is required.

The Sawtooth heuristic

Sawtooth Software (See Brian K. Orme and Keith Chrzan (2017), Becoming an Expert in Conjoint Analysis: Choice Modeling for Pros, Sawtooth Software, Inc.) suggests that the minimum sample size, n, should be:

where:

q is the number of questions shown to each respondent (excluding the none of these),

a is the number of alternatives per question

c is the maximum number of levels of any attribute

For example, if the attribute with the most levels has 10 levels, there are 6 questions, and 3 alternatives, then:

Q and Displayr automatically compute the Sawtooth heuristic when creating experimental designs, and alert the user when it is violated via a warning.

The Sawtooth heuristic has the advantage that it is very prescriptive. It also has a certain logic to it that is derived from how sample size determines statistical significance in linear regression (i.e., if using linear regression to conjoint data, then c/qa is the sample size for each attribute). Consequently, it is a very useful heuristic for understanding how decisions about the number of questions, alternatives, and attribute levels impact the required sample size.

However, its prescriptiveness and usefulness should not be confused as indicating it is a safe way to work out the sample size. Its basis in linear regression is problematic as linear regression isn't used in modern conjoint models, the number 1,000 is just a made-up number, and degrees of freedom rather than sample size is the relevant determinant of statistical significance in linear regression, and the difference between these will be non-trivial in studies with many attributes and levels.

Standard errors of 0.05 or less

Another heuristic is that the standard errors for the attribute levels should be at most 0.05. Why 0.05? It is a smallish small number, and statisticians like smallish numbers that are around 0.05, presumably because it reminds them of the 0.05 level of significance.

Experimental design software (e.g., in Q and Displayr) will often automatically show the standard errors of the attribute levels in a table at the top of the experimental design output.

Increasing the assumed sample size reduces the standard errors. Similarly, all of the things that cause the required sample size to shrink (see the previous section) also cause the standard errors to shrink.

This approach is complex, but it is not particularly scientific. The standard errors of attribute levels are not of direct interest in most conjoint studies. Instead, conjoint studies typically are focused on either comparing relativities (e.g., is one attribute more important than another), linearities (e.g., is there a certain price point after which people are super price sensitive), and differences between share predictions, and the standard error of a coefficient does not approximate any of these quantities.

Using simulations and prediction accuracy

All the heuristics described so far were invented many years ago, before more modern analysis methods like Latent Class Logit and Hierarchical Bayes were invented. The last heuristic is designed to check that the sample size is large enough to use these more modern techniques. A basic process for determining the sample size is appropriate for these modern methods is to:

Create an experimental design.
Use The Sawtooth heuristic to choose an initial sample size.
Generate synthetic data.
Estimate a Hierarchical Bayes model.
Modify the design and sample size until the prediction accuracy drops below 90%, as prediction accuracies above this value suggest that the model is grossly over-fitting the data (i.e., modeling noise rather than true differences between people). Why 90%? It is 1 – 0.05 * 2; this is another made-up number, based on some experience and gut feel rather than rocket science.

Please see the blog post How to Use Simulated Data to Check Choice Model Experimental Designs Using Displayr

Using simulations, scenario planning, and conducting all the intended analyses of interest

There is arguably a theoretically pure approach to working out the sample size of a choice-based conjoint study. It is not practical, but is presented as it helps to explain why none of the above approaches is particularly strong:

Work out all the analyses that you need to conduct, including working out the required confidence intervals for any results of interest.
Work out all the possible results that could occur. For example, if focused on understanding pricing, this would involve identifying the realistic combinations of price elasticity and the shape of the price-response function (e.g., is it kinked, and if so, where?). This approach is known as scenario planning.
Simulate data for all the possible scenarios and identify the sample size required to ensure that all the results of interest have confidence intervals of the desired size.