A special value should be used for storing missing values. Where there are multiple reasons for having missing data, a separate value or code should be used for each. The value needs to be different from the value used to indicate when an option was shown, but not chosen. Zero (0) should not be used to indicate missing values.
Special value for missing data
Where data is missing, a special value (i.e., code) should be used to store the data. The table below lists the standard missing values by data file types:
Missing data can be stored either using an SPSS SYSTEM-MISSING VALUE or, a unique value is used and this has been specified as an SPSS USER-DEFINED MISSING VALUE. | |
A blank value or NA |
Sometimes it is appropriate to treat missing values for some of the questions as being equivalent to a “No” response (e.g., giving them a value of 0). For example, if people are asked which brands they have consumed, but are only shown brands that they are aware of, then this would be appropriate. In this instance, the data should ideally be included in the data file twice, once with the SPSS SYSTEM-MISSING VALUE values and once with the “No” responses instead.
Multiple Values Should Be Used Where There Are Multiple Known Interesting Reasons for Missing Data
Some variables can have multiple different reasons for missing data. For example, in a variable representing a question or option in a survey, data may be missing because:
- The person answering the question skipped the question.
- The person was not shown the question as it was inapplicable.
- The person was not shown the question as their correct answer can be derived based on earlier questions.
- The person answering the question terminated the interview before getting to this question.
- Randomization was used to determine which people saw which questions or options.
- The data has been cleaned and the cleaning involved removing their data.
- Respondents said Don't know.
Where there are multiple reasons, and it is possible to work out which data is missing for which reason, then different values should be used to record the different reasons.
Values that are not possible should be used for the missing values. For example, if the study asks for quantity consumed, values of -96, -97, -98, and -99 can be used to indicate missing values. The missing values should be set as SPSS USER-DEFINED MISSING VALUE. |
|
Values that are not possible should be used for the missing values. For example, if the study asks for quantity consumed, values of -96, -97, -98, and -99 can be used to indicate missing values. |
The value should be different to shown but not chosen
Where an option was shown, but not chosen, it is not missing data and requires a different value to the value used to indicate missing data.
Don't use 0 to store missing data
It is never appropriate to record all missing values in a data file as having a value of 0. This is very important, as for many binary variables the No response is often coded as a 0, making it impossible to determine which respondents said No and which were not asked the question.
Comments
0 comments
Please sign in to leave a comment.