Choice of Experimental Design Algorithm – The Data Story Guide

A variety of algorithms have been developed for creating experimental designs. Common algorithms are:

Random designs
Shortcut
Complete enumeration
Balanced overlap
Efficient designs
Efficient designs with priors

Random designs

The simplest procedure to create a design is to randomly choose a level for each attribute in each question. Consider the following (artificially) simple attributes and levels for the car market.

BRAND	PRICE
General Motors	$20K
BMW	$40K
Ferrari	$100K

The choice question below was created using Displayr’s Random algorithm, which randomly constructs each alternative and eliminates any duplicates. The resulting question is poor to the point where a respondent would likely be confused, and start to make erroneous conclusions when answering (e.g., inferring that the expensive BMW is a better model).

Furthermore, when random designs are created, we end up getting more information about some of the attribute levels than others. For example, the default random design, with 10 questions in it, showed General Motors 12 times, and Ferrari only 7. Such a design is known as unbalanced (see How to Evaluate and Compare Experimental Designs).

The imbalance of this design is even greater when we look at the pairwise balance, with the most frequent pair appearing 2.5 times as often as the least frequent pair.

Shortcut algorithm

The shortcut algorithm seeks to be more balanced than the random algorithm. It builds a design by repeatedly adding attribute levels. For each attribute of each alternative of each question, the algorithm adds the least frequently used level to the design. If multiple levels are used equally frequently in the question, it adds the least frequent level in the overall design so far. In the event of a tie after all this, the algorithm makes a random choice.

This algorithm is called "Shortcut" because, unlike the next two, it only considers how often a level appears, and takes no account of pairwise balance.

Complete enumeration

Like the shortcut algorithm, the complete enumeration algorithm is a “greedy” algorithm that repeatedly adds an optimal level to the design. When selecting the optimal level, it considers:

The frequency with which each attribute level appears (i.e., as done by the shortcut algorithm).
Pairwise balance: how often each pair of levels from different alternatives appear in the same alternative. For example, how often Origin is USA and Ethical is Fair trade. Intuitively, a good design should show all the combinations of pairs. On the flip side, we would consider a design poor that always shows the same combination, like USA and Ethical.
The overlap between alternatives, whereby the same level appears in multiple alternatives of the same question. Compete enumeration reduces this overlap, in favor of a design with better frequency and pairwise balance.

Balanced overlap

Balanced overlap designs are very similar to complete enumeration, except that they do not limit the overlap between alternatives to the same extent. This seems to be a better strategy, as it makes the tasks easier for respondents (when each alternative differs on each alternative, it requires more effort to make a choice).

The statistics above summarize a random design with 10 versions created for the simple car attributes and levels (see How to Evaluate and Compare Experimental Designs for details on how to interpret them).

The statistics below are for a balanced overlap design with 10 versions. It's clearly better on every measure. This is always the case.

The first two questions of the balanced overlap design are shown below. Take a good look at Q1. A Ferrari for $40K. That's a good deal! Now, look at Q2. It's now $20K. Another great deal!

The balance and pairwise balance can be understood from the tables below. The balanced overlap design seeks to create balance and has clearly succeeded in this.

Efficient designs

The shortcut, complete enumeration, and balanced overlap designs seem sensible. A more mathematical approach to creating designs is to use algorithms that create a design that minimizes the d-error. Such designs are known as efficient designs.

Efficient design algorithms are available in Q, Displayr, SAS, and JMP.

For our simple example, the key results are shown below. Its results are very similar to those of the balanced overlap design, except that the d-error is marginally lower (which is better). Note that even though this design doesn't explicitly strive for balance, it is still highly balanced, as balance is closely related to reducing standard errors and d-error.

Efficient designs with priors

An implicit assumption of balanced designs is that we expect that people are indifferent between all the levels of an attribute (i.e., that they all have the same utility). This is rarely a reasonable assumption. For example, it implies that it's reasonable to assume that people are indifferent to price, which goes against all of economics.

When using an efficient design, it is possible to put in as an assumption your best guess of the utility, and the standard deviation of the utility, of each of the attribute levels. These best guesses are called priors. For example, with our simple example, a design was created assuming the average utilities as shown below.

With the priors above, the design ends up becoming unbalanced. Why this occurs is clearest when we examine the pairwise frequencies, where there are smaller pairwise frequencies for Ferrari at $20K and General Motors at $100K.

How were the priors created? The mean utility of General Motors was set to 0. Why 0? It's a useful convention to set the first level's utility to 0.

BMW was to 1, as when you ignore price, it's likely more appealing as a brand than General Motors. Ferrari was set to 2 as it's more appealing again.

Similarly, with price, the values were chosen of 0, -1, and -3.

Can any number be used? No. It is best to create the numbers on a scale of -3 to 3 unless you have strong evidence otherwise (e.g., other conjoint studies). To be technical, the numbers need to be on a logit scale.

When people first start using priors, they often get a bit scared or skeptical. It feels like you are making up the answers. But, the thing to remember is that in the more traditional experimental design we were making an obviously wrong assumption, which was that all the utilities were 0. Better to go with judgments like the ones here than such a dumb assumption as that, say, price and brand are irrelevant. If you are really uncomfortable, you can do a small pilot and update them. The key thing to remember is that if we guess a bit wrong, it doesn't make the research invalid. Rather, the better we guess, the more statistically significant our results will be.