All Combinations of Piped Inputs Appear as Separate Variables – The Data Story Guide

Where piping has been used, the variables in the data file should be structured so that the user of the data files does not need to understand the piping. That is, variables are created so that each possible combination of piped values appears as a separate variable.

Example

Consider a questionnaire where each respondent is asked to rate the appeal of a random selection of three of four different products, and the order has been randomized or rotated, such as in this table:

ID	Q1		Q2		Q3
1	Microsoft	Apple		IBM
2	Apple		Microsoft	IBM
3	IBM		Google		Apple
4	Google		Microsoft	IBM

The data should be exported as if people had been asked four different questions and all respondents had seen them in the same order (where blank denotes a missing value):

ID	Q. Microsoft	Q. Apple	Q. IBM		Q. Google
1	Data from Q1	Data from Q2	Data from Q3	
2	Data from Q2	Data from Q1	Data from Q3	
3			Data from Q3	Data from Q1	Data from Q2
4	Data from Q2			Data from Q3	Data from Q1

If order effects are interesting (and they usually aren't), the order in which the data was collected should also be exported as additional variables. For example:

ID	Order Microsoft	Order Apple	Order IBM	Order Google
1	1		2		3		SYSMIS
2	2		1		3		SYSMIS
3	SYSMIS		3		1		2
4	2		SYSMIS		3		1

An exception to this rule

In the example above, it is necessary to remove the piping as the data cannot easily be analyzed otherwise. However, if each person is only shown a single option (e.g., was only asked Q1 above), then it is sufficient to not fix this issue, including in the data file Q1 and a second variable showing the order.

Fixing the issue when programming the questionnaire

The way to easily address this issue is to not program questions 1, 2, and 3 as described above, and instead program four questions, one for each brand, using randomization to determine the order and which respondents see which subset of the options.