Variable labels should be short and should clearly communicate the underlying structure of the data.
Short
Variable labels appear in most reporting, so it's ideal to have short clear descriptions. For example, Q4. Age, is better than Q4. Which of the following age groups do you fall into?
Unique
Variable labels such as How important is this on a scale of 1 to 10, provided for each of a set of variables, are of no use as it is impossible to determine what is being rated without referring to the questionnaire. A better variable label is Importance: Price.
Where practical, the variable labels should correspond to the actual wording used in the questions, provided it is not too lengthy (see the previous point). Many programs that write data files automatically truncate variable labels to 120 characters, which can cause automatically generated labels from looped questions to be uninformative (e.g., the first 120 characters may not include all of the information about the loop).
Informative
Quick analysis requires that you can look at a table and know what it means, without having to refer to some other documentation. Consequently, it is extremely useful to have informative labels for all variables. A variable label of VAR045 is much less useful than Reasons for buying Coca-Cola.
Any strange text, such as HTML tags, should be removed.
Variables sets with multiple variables (e.g., multiple response questions)
If there are four variables that indicate which of four products a person owns, it is useful if the names have a common structure with a commonality at the beginning of the label, as this makes it easy for both people and computes to recognize variable sets. For example:
Products owned: Savings account
Products owned: Checking account
Products owned: Loan account
Products owned: Credit account
Grids
Grid questions should contain labels that describe both the specific option being evaluated and also some common aspects of the wording. For example, the following labels are poor:
Live a long life Be rich Have lots of friends
whereas these are much better:
How strongly do you agree that it is important to... Live a long life How strongly do you agree that it is important to... Be rich How strongly do you agree that it is important to... Have lots of friends
Care needs to be taken with the creation of labels for looped questions and some grid questions. Consider a study containing the following three questions:
Q1a When you think of soft drinks that are sexy, which ones come to mind?
MULTIPLE RESPONSE Coke Pepsi Fanta Other Q1b When you think of soft drinks that are masculine, which ones come to mind?
MULTIPLE RESPONSE Coke Pepsi Fanta Other Q1c When you think of soft drinks that are powerful, which ones come to mind?
MULTIPLE RESPONSE Coke Pepsi Fanta Other
If the variable labels set up for such questions follow identical structures, this will make the use of the file considerably more straightforward. Some programs, such as Displayr and Q, will automatically detect the structure in the data and present it as a grid. For example, the following labels make the interpretation of the grid straightforward.
Variable Name Variable Label Q1a1 Q42. Brand attitude Sexy brands: Coke Q1a2 Q42. Brand attitude Sexy brands: Pepsi Q1a3 Q42. Brand attitude Sexy brands: Fanta Q1a4 Q42. Brand attitude Sexy brands: Other Q1b1 Q42. Brand attitude Masculine brands: Coke Q1b2 Q42. Brand attitude Masculine brands: Pepsi Q1b3 Q42. Brand attitude Masculine brands: Fanta Q1b4 Q42. Brand attitude Masculine brands: Other Q1c1 Q42. Brand attitude Powerful brands: Coke Q1c2 Q42. Brand attitude Powerful brands: Pepsi Q1c3 Q42. Brand attitude Powerful brands: Fanta Q1c4 Q42. Brand attitude Powerful brands: Other
Common problems with the setup of grid questions
As an example, the following contains inconsistencies that prevent any auto-detection of the underlying structure:
Q1a1 Q42. Brand attitude Sexy brands: Coke Q1a2 Q42. Brand attitude Sexy brands: Pepsi Q1a3 Q42. Brand attitude Sexy brands: Fanta Q1a4 Q42. Brand attitude Sexy brand: Other Q1b1 Q42. Brand attitude - Masculine brands: Coke Q1b2 Q42. Brand attitude - Masculine brands: Pepsi Q1b3 Q42. Brand attitude - Masculine brands: Fanta Q1b4 Q42. Brand attitude - Masculine brands: Other Q1c1 Brand attitude - Powerful brands: Coke Q1c2 Brand attitude - Powerful brands: Pepsi Q1c3 Brand attitude - Powerful brands: Fanta Q1c4 Brand attitude - Powerful brands: Others
Common problems with the setup of grid questions include:
- Any of the problems with multiple response questions discussed earlier in the article.
- The Label field has been set up with contradictory or inconsistent information. Two common causes of this are:
- Typographical errors. While these may seem like minor issues, they prevent data analysis programs from automatically identifying the looped structures in the data. In the example above:
- An additional space precedes Pepsi for Q1b2.
- There is no s with brands in Q1a4.
- An s has been added to Others in Q1c4.
- Q42. is absent from labels for Q1c.
- Truncation of the Label field by the software used to create the data file. For example, the label may read Which of the following brands do you typically consume on a hot day? with the specific brands not listed and thus, there is no way to deduce the correct labeling of the rows and/or columns of the grid (other than assuming they are consistently ordered which, if an incorrect assumption, will result in incorrect analyses).
- Repeated labels. For example, if there are two Other/Specify options in the questionnaire then they should be given distinct labels like Other 1 and Other 2. Duplicated labels can prevent the automatic detection of grids, as there is no way to tell the difference between the two options. Each label in the set must be unique.
- There are inconsistencies in terms of the number of alternatives (brands) or attributes in the grid (e.g., some brands may not be shown with some attributes). The solution to this problem is to create new variables with no data.
- The order of the variables is inconsistent. In the example above, the four brands are shown in the same order for each attitude statement, and this is required for successful automatic identification of the grid layout.
Where multiple questions are asked in a loop, it is usually best if all the data appears question-by-question (i.e., all the looped variables for one question, then all the variables for the next, etc.). However, if the intent is to create stacked data, it is instead usually better to structure the data by loop iteration (i.e., first show all the data from the first iteration of the loop, then from the second, etc.).
Comments
0 comments
Please sign in to leave a comment.