Hierarchical Bayes for Choice-Based Conjoint – The Data Story Guide

Most modern analysis of choice-based conjoint data uses a model that is known as Hierarchical Bayes. This article describes:

The problem with the multinomial logit model, which is the main choice model which preceded Hierarchical Bayes
The basics of Hierarchical Bayes (HB), which is a superior model to the multinomial logit model
Why Hierarchical Bayes is almost always better than multinomial logit
That Hierarchical Bayes is not the correct name for the technique (it is a misnomer)

The problem with the multinomial logit model

Hierarchical Bayes is best understood in terms of how it solves the problems of the main model that preceded it, the multinomial logit model.

The multinomial logit model has a terrible assumption hidden in its math. It assumes that if, say, the price of one alternative is dropped, then this alternative will take share from all the other alternatives in proportion to their current market shares.

This doesn’t sound so bad until you think about it with a real-world example. If, say, modeling cabin choice on an international flight, the multinomial logit model assumes that if the price of first class fares on international trips is dropped, more people will switch from economy to first than from business to first, simply because there are more people in economy than in business class.

This problem is variously known as the independence of irrelevant alternatives (IIA) problem and the red bus blue bus problem.

Since the discovery of the multinomial logit model, a vast amount of effort has gone into circumventing this problem. In the early days, researchers would use interactions (in particular, cross-effects) as a way of limiting the problem. As computers got faster, attention turned to combing up with better models that didn't make this assumption.

One “fix” for the IIA problem is to estimate a separate multinomial logit model for every respondent. However, in practice this rarely works out so well, as typically there is either insufficient data to do this, or, the accuracy of the resulting models is poor.

The basics of Hierarchical Bayes (HB)

Using some rather clever math and computational tricks, which are well beyond the scope of this article, Hierarchical Bayes simultaneously analyzes the data in such a way that it obtains multiple estimates of utility for each person. Each of these estimates is called a draw.

Why multiple utilities (draws) per person? There is typically insufficient data to obtain a reliable estimate for each person, so, instead, HB computes multiple estimates for each person, where the multitude expresses the uncertainty about each person’s utilities.

Most practitioners compute an average for each person, and then use that average in the simulator and reporting stages, although it is theoretically superior to use all of the estimates and thereby take uncertainty into account.

Why Hierarchical Bayes is almost always better than multinomial logit

Hierarchical Bayes has some huge advantages over the multinomial logit model, In particular:

It explicitly allows for differences among people and automatically estimates such differences from the data. It is difficult to understate how powerful this is. For example, it makes it possible to:
- Clean data, by identifying people who have answered randomly or irrationally.
- Segment people.
It reduces to the multinomial logit model. That is, in the extremely unlikely situation that data can be correctly described by the multinomial logit model, then the Hierarchical Bayes model automatically simplifies itself and will provide basically the same results as the multinomial logit model.
As is true of Bayesian models in general, the Hierarchical Bayesian model provides shrinkage estimates for each respondent, which means that the estimated utilities for each respondent are a weighted average of the average utilities for the total sample, and an estimate made just using the respondent's data, where the weight is based on the amount of data for the respondent. Where there is a lot of data for each respondent, then the estimate is based solely on the data of the respondent. Where there is only a small amount of data for the respondent, the estimate for the respondent ends up being the same as the estimate for the total sample. This ensures that the estimates for each respondent are not too noisy.
It explicitly quantifies uncertainty, which means that analysts can:
- Produce better predictions,
- Use the results to perform statistical tests.

However, Hierarchical Bayes has some practical disadvantages:

It is slow to calculate. A multinomial logit model will take milliseconds to compute. Hierarchical Bayes can take minutes, hours, and sometimes, with a big enough data set, days.
It is hard to understand. Unless you have amazing mathematical skills, a PhD in a related discipline, and many years of experience, it's impossible to understand the underlying math of the model.

Hierarchical Bayes is not the correct name

The model that is called "Hierarchical Bayes" by market researchers is not known by this name outside of market research. Technically, Hierarchical Bayes is an approach to fitting statistical models to data, not a model.

A more correct name for the model that's called "Hierarchical Bayes" in market research is a Mixed Logit Model with Normal Mixing Distribution estimated using hierarchical Bayes (Monte Carlo Markov).