Sometimes data that is collected cannot be accurate. People cannot be 20 meters tall. They cannot eat 500 hamburgers in a day. It's not plausible that their data can be valid if they have clicked the right-most option in every question in a survey.
When we have impossible data, our options are to replace it with the correct data, if we have a way of working that out, or replace it with missing values, which opens up the various issues described in Checking and Understanding Missing Data.
A note of caution: the goal is to deal with impossible data. Impossible does not mean the same as highly unlikely. Most surveys involve samples of hundreds or thousands of people. There are lots of highly unusual people in the real world, and removing such data can decrease the robustness of a study.
In a humorous example, a beer consumption survey in Australia identified that beer consumption in the city of Darwin was lower than was shown by sales data. An audit revealed the problem. A non-drinker who had never been to Darwin was responsible for checking the data. When checking the data, he had assumed that everybody who said they had consumed a case of 24-bottles of beer was mistaken and that they must have meant they drank a single bottle, so he had "corrected" the error, causing the survey results to become wrong.
Comments
0 comments
Please sign in to leave a comment.