Using Common Sense to Test Conclusions – The Data Story Guide

Common sense is often the most powerful way of testing conclusions from the research. In practice, common sense tends to come down to three related questions:

Does a result feel likely to be true?
Is the result consistent with other data and theories?
Are there alternative plausible explanations (APEs) that cannot be ruled out?

Two case studies are presented.

Does it feel likely to be true?

One of the most remarkable aspects of the human brain is our ability to make conclusions about how the world works. What causes what and why. It starts as babies, when we learn the laws of physics just to survive, and keeps going throughout our lives. The ability allows us to instinctively spot results that are unlikely to be true. And, most skilled researchers rely heavily on this instinct when looking at results. If the result smells, there's a good chance something's a bit fishy. This is also known as the smell test. A corollary introduced by statistician Andrew Ehrenberg is called Twyman's Law, which states that interesting results are usually wrong.

If your intuition tells you that the result is unlikely, the next step is to work out why, which is the focus of the rest of the article.

Consistency with other data and theories

If a study is not consistent with the results of other studies, it is less likely to be correct. To use the jargon, a result that is able to be replicated is more likely to be correct.

Theories about human behavior are themselves grounded in data (albeit not always good data), so if a result is consistent with theories of human behavior, it is more likely to be correct, all else being equal.

Economic theory is often the most efficient one to use when checking results, even when the studies are not about economics. A core contention of economic theory is that people act in accordance with their self-interest. If your study has found a conclusion that suggests that people are not acting in accordance with their self-interest, there's a good chance that something is wrong. For example, a study that finds that people prefer to pay more is a study that likely has an error in it, unless there's a good reason to believe that higher prices signal quality (e.g., as in the markets for wine and bicycles).

Finding and testing APEs

When a result seems to be interesting, one way of working out if it is interesting is to see if there are less interesting alternative plausible explanations for the result. A textbook example of this is the number of firefighters that attend fires:

Observation: the more firefighters that attend a fire, the worse the damage.
Possible explanation: the number of firefighters causes the damage.
Implication: we can reduce damage by sending fewer firefighters to fires.
Alternative plausible explanation: the correlation between the number of firefighters and damage is spurious; the truth is that more firefighters are dispatched to fires that are believed to be more destructive.

Typically when we search for APEs we have observed a correlation in the data, which has an obvious implication, and we are searching for other explanations for the correlation. This explanation is generally that the correlation is caused by some more complicated relationship, such as the two variables that are correlated being correlated with some other factor (e.g., the intensity of the fire in the example above).

When analyzing surveys, in particular, the most common APE is that some artifact of the data collection process has caused the correlation. This idea is explored in the two case studies below.

APE hunting and testing are part of a more general process of establishing the causality of research findings. Two good introductions to this (very complex) field are The Book of Why: The New Science of Cause and Effect by Judea Pearl and Dana Mackenzie (2019) and Counterfactuals and Causal Inference: Methods and Principles for Social Research, 2nd edition, by Stephen Morgan and Christopher Winship (2014).

Case study: tech enthusiasm and likelihood to buy the iLock

A study was conducted investigating interest in the iLock, asking people how likely they would be to buy the iLock.

The table below shows how stated likelihood to buy the iLock by attitude to technology. The table suggests that there is a correlation between attitude toward iLock and the likelihood of buying. It appears that those that like technology more are more likely to say they will buy the iLock.

An alternative plausible explanation for this finding is that the real difference between people is a response bias known as yeah-saying bias, whereby people differ in which options they like to choose in questionnaires. The theory is that there are some people that just like to choose the first option in all questions, there are some people that prefer to choose the middle options, and some people that prefer to choose the last options. If true, this would also explain why we would expect to see a correlation like the one shown in the table above.

So, we are left with two completing explanations for the observed correlation in the table above:

Explanation 1: Purchase intent is partly caused by interest in technology.
Explanation 2: Yeah-saying bias.

We can test this hypothesis by identifying some other data and seeing if it is also affected by a yeah-saying bias. The table below shows just such a table. The columns represent the proportion of episodes of the TV show Fargo by purchase intent. If yeah-saying bias was a strong determinant of survey results we would expect to see a correlation in this table also, but we do not. By ruling out the yeah-saying bias, we have reason to believe that purchase intent is caused by interest in technology increases.

The basic approach to analysis undertaken in this case study is to focus on relative correlations, rather than the absolute correlation, and this is an application of the delta principle.

Case study: smoking

A landmark study by Doll and Hill (1952) employed a case-control design to understand the link between the frequency of smoking and lung cancer. The cases in the study were lung cancer patients in a UK hospital. The controls were a random selection of patients at the same hospitals, who had not been diagnosed with lung cancer. The table below shows data for men, listing their recalled frequency of smoking prior immediately prior to their diagnosis. The data clearly reveals a correlation: patients with lung cancer smoke more often. The table suggests that the more people smoke, the greater their chance of getting cancer.

The study was replicated at least 19 times, each time with the same result. But, as arguably the world's greatest statistician, pipe smoker, and consultant to the tobacco industry, R.A. Fisher pointed out, if you repeat a study 19 times then the bias is repeated each time.

Multiple alternative explanations were present for the table above. The two most interesting were :

Recall bias: Perhaps people that have been diagnosed were aware that smoking was a likely cause, making smoking more salient in their minds, and leading them to recall higher levels of smoking than among the patients without lung carcinoma.
Common genetic cause: Perhaps there are some genetic factors that lead to defects in the lungs, and these defects both lead people to smoke (as a form of treatment) and also lead people to cancer. If true, it is possible that smoking may actually lead to a decrease in cancer!

The recall-bias APE was ruled out by conducting prospective studies, whereby groups of people were asked about how often they smoked, and then they were tracked over time to see the extent to which the smokers had worse outcomes, such as lung carcinoma. These studies showed a strong link between smoking and lung carcinoma.

The APE of a common generic cause was dispatched using some mathematics and logic. The available data suggested that people who smoked were around nine times more likely to get lung cancer than non-smokers. For there to be a gene that is caused by smoking and lung cancer, it is apparently necessary that the prevalence of this gene must be at least nine times more common in smokers than non-smokers. Apparently, this is essentially impossible, a point that can be appreciated by following through with a few of its implications:

Smoking as a trait would be obviously inherited.
Non-smoking religions, such as Mormonism, are by implication caused by genetics.

Chapter 5 of The Book of Why: The New Science of Cause and Effect by Judea Pearl and Dana Mackenzie (2019) discusses this case study in more detail.