Multiple response questions permit people to choose one or more options from a list. In most situations, multiple response questions should be set up in a binary format, with one variable for each possible answer. An alternative format is the max-multi format, which is suited to very large numbers of possible answers.
The binary format sets up the data as if asking the following five questions:
Q1a Do you have a savings account? No 0 Yes 1 Q1b Do you have a checking account? No 0 Yes 1 Q1c Do you have a credit card? No 0 Yes 1 Q1d Do you have a home loan? No 0 Yes 1
Q1e Do you have none of these? No 0 Yes 1
When setting up the values and labels for the binary variables, it is important that the same value labels are used for all options. In particular, the value label should not contain the name of the option being evaluated (e.g. Saving account). For example, if the question asks Which of the following brands are masculine the values and value labels should be the same for each variable, similar to:
SYSMIS Option not shown 0 Not selected 1 Selected
The example below shows binary data for four options (variables), where none have missing data.
Where a multiple response question contains an "other specify" option, the resulting text variable should appear after all binary variables (i.e., if it appears in the middle, it will prevent the creation of a multiple response set).
Common errors when setting up questions in the binary format
- Using the max-multi format instead of the binary format (this is discussed in the next section).
- Failing to distinguish between missing values versus values that were not selected by respondents. typically.
- Failing to address piping/randomization when creating the data file.
- Confusing the Variable Label with the Value Label. For example, the label for value 1 in the first variable is Has a savings account, and the label in the second variable is Has a checking account. It's better to use consistent labels (e.g., having a label of Selected for the value of 1). The reason this is important is it allows software to detect variables that should be grouped together when importing the data.
- Having inconsistent values. For example, having the Yes values represented by a value of 1 for the first option, a value of 2 for the second option, etc. The below type of labeling scheme may appear sensible at first glance, but it creates difficulty for the user, particularly if using automatic tools to set up data files. This is because there is no consistent set of values or labeling, making automatic recognition of a multiple response set problematic.
Max-Multi format represents the data with a separate variable for each response. So, if all respondents gave only two responses, then only two variables are required. The example below shows the max-multi setup of the binary data shown above.
When to use the max-multi format
The max-multi format is useful when dealing with a large number of possible response options. For example, there are thousands of car models, so such data is often best stored in this format.
Problems with the max-multi format
For most problems, the max-multi format is inferior to the binary format. It has the following problems:
- It can only be easily analyzed by software that is designed for analyzing max-multi data. By contrast, any software can analyze data in the binary format, as the percentage of people choosing an option is the average of the 0s and 1s in the corresponding variable.
- It is difficult to correctly analyze max-multi data when some options weren't available to all respondents in the study. For example, looking at ID 1 in the example data above, we can see from the binary data that the respondent chose the second and the third option, and was also shown the first and the fourth option but did not choose them. We know that options one and four were shown but not selected as a value of 0 appears; if the data was missing that would mean the option was not shown to this respondent. By contrast, with the max-multi format, we can see that options 2 and 3 were chosen, but we have no way of knowing whether options one and four were shown and not selected, or not shown at all.
The same response may appear for the same case in two or more different variables. Where the data is messy such a result is possible, but it is as likely to indicate some form of error.