A data file should contain a variable that uniquely identifies each case (typically, each respondent). That is, each case should have a value that is different from those of the other cases, even if the same person has provided multiple cases of data. If respondents do provide multiple cases then a respondent identifier should be included as an additional variable.
Considerations, which often conflict, when creating the ID variable are that:
- It will be a relatively short number. It's common that values need to be used in code (e.g., if creating filters, and having very long IDs containing a mixture of letters and numbers can make this difficult).
- It should not be personal information. That is, in general i's inadvisable, for example, to use an email address, as then the resulting data file may contain personal information, which brings various international laws about privacy and data protection into play.
- It should be able to be used as a key to link to other data. For example, using a customer number can be a good ID, as it makes it easy to check data and append additional data.
Please sign in to leave a comment.