Using RLH to Remove Random Choosers from Choice-Based Conjoint Studies – The Data Story Guide

The root likelihood (LRH) is a measure of how well a model predicts each respondent's conjoint data. Its main use is for identifying respondents that have provided random-like data, so that the data can be cleaned. This is done as follows:

Fit a choice model
Calculate the RLH for each respondent
Examine the distribution of the RLH statistic
Simulate random data under the assumption that respondents are randomly choosing
Remove respondents with low RLH statistics

Fit a choice model

Fit a model to the data. In general, the model should be Hierarchical Bayes for Choice-Based Conjoint.

It may be preferable to fit a choice model without an attribute for alternative (i.e., without estimating alternative specific constants). The reason that this is that if a person has chosen option 3 every time, a model with an attribute of alternative may predict their choices very well. (The method described in this article seems to work well even when this hasn't been done.)

Where the model includes none of these alternatives, the trick is to merge together the levels of the attributes other than 'none of these'. In a labeled choice experiment, you need to skip this step and proceed with the model with the attribute for alternatives.

Calculate the RLH for each respondent

Displayr, Q, and Sawtooth all have functions for doing this automatically. It can be done manually as follows:

For each question for each draw, compute the probability that the person chooses the option that they choose. That is, compute the probabilities for the person based on their draws, using the multinomial logit model.
Multiply the probabilities together. For example, in a study involving four questions, if a person chooses an option that the model predicted they had a 0.4 probability of choosing, and their choices in the remaining three questions had probabilities of 0.2, 0.4, and 0.3 respectively, then the overall probability is 0.0096. This is technically known as the person's likelihood.
Compute likelihood^(1/k), where k is the number of questions. In this example, the result is 0.31. The value is known as the root likelihood (RLH). It is better than just looking at the percentage of the choices that the model predicts correctly, as it rewards situations where the model was close and penalizes situations where the model was massively wrong. Note that the RLH value of 0.31 is close to the mean of the values (technically, it is the geometric mean).

Examine the distribution of the RLH statistic

Plot the RLH statistics for each person and determine a cutoff point, re-estimating the model using only people with RLH statistics above the cutoff value. In the chocolate study, we gave people a choice of four options, which tells us that the cutoff point needs to be at least 1/4 = 0.25. However, hierarchical Bayes models tend to overfit conjoint analysis data, so a higher cutoff is prudent. The histogram below suggests that there is a clump of people at around 0.33, which perhaps represents random choosers.

Simulate random data under the assumption that respondents are randomly choosing

It is possible to simulate data under the assumption that people are randomly choosing (this is equivalent to simulating data where each person is assumed to have a utility of 0 for all attribute levels). The histogram below shows the distribution of the RLH for simulated data for this experiment. It shows that we can observe results from .023 through to 0.42.

Remove respondents with low RLH statistics

Respondents whose RLH is too low are then either deleted or, filtered out from any further analyses of the conjoint data. There is no cut-and-dried rule about what "too low" is for an RLH. One approach is to look at the 95th percentile of the random data, which, in the example above, would mean deleting any respondent with an RLH of 0.35 or less.