Health Journalism Glossary

Hill Criteria for Evaluating Observational Studies

  • Medical Studies

If there’s one phrase that most reporters who cover medical studies can repeat in their sleep, it’s the caution that observational studies only show associations and do not prove cause-and-effect. However, sometimes observational studies are the only kind of research that can ethically or practically be done to understand a health problem. Observational studies are also usually more reflective of the kinds of messy and multi-faceted conditions faced by people in the real world. Thus, they can be more applicable to clinical practice than carefully controlled ivory tower experiments like randomized, controlled trials.

Although cause-and-effect conclusions cannot be drawn from a single observational trial, there are circumstances in which we can impute cause-and-effect from a broad body of evidence from observational trials if they meet specific criteria. An example is the research on smoking: Since it wouldn’t be ethical to conduct randomized controlled trials to see if smoking causes lung cancer, that conclusion came from analysis of a robust evidence base of observational studies that met key criteria.

One way to evaluate observational research is to use The Hill Criteria, a set of nine tests developed by Sir Austin Bradford Hill, a British epidemiologist and statistician. He published these criteria in a 1965 essay called “The Environment and Disease: Association or Causation?” and they are still being used today.

Deeper dive
The nine criteria are:

  • Strength – Stronger associations are more likely to prove causal than weaker associations. Hill gave an example that chimney sweeps were 200 times more likely to die of a particular type of cancer that affects the skin of a man’s scrotum than men in other occupations. “Chimney Sweep Cancer,” identified in 1775, was the first kind of cancer to be tied to a person’s occupation. An association between smoking and lung cancer was also very strong. Observational studies found smokers were nine to 10 times more likely to die of lung cancer than non-smokers.
  • Consistency– has the association been repeated in different studies by different researchers in different places and at different times?
  • Specificity – How narrow is the observed relationship? Is it limited to one group of people who are dying from one kind of a disease? Or is the same group of people dying from many different causes? Making it difficult to pinpoint the tie?
  • Temporality – In order to prove, or at least to suggest with a high degree of suspicion, that smoking causes lung cancer, people had to start smoking before they developed lung cancer. Hill believed this criterion to be the most important for determining causality. The horse has to come before the cart. But temporality isn’t always easy to determine (see reverse causality).
  • Dose-response – Studies that demonstrate a biological gradient, that is the higher the dose or exposure, the more likely a person is to have develop the outcome under study, are more likely to be causal than those that don’t find a dose-response relationship. The more cigarettes a person smokes, the more likely they are to develop lung cancer, for example.
  • Biological plausibility – Is there a biological mechanism that helps to explain the observed relationship?
  • Coherence – Is the observation in line with previous studies on the same question? Findings that are repeated are stronger than those that are new or contradictory. (Covering a topic that’s new to you? A search of PubMed or the Cochrane Library can help you check for coherence.)
  • Experiment – Does removing the exposure change the observed outcome? Answering this question usually requires additional studies, but in some cases, observational studies may support the conclusions of clinical trials.
  • Analogy – Do similar exposures result in similar outcomes? Hill argued that observing the birth defects of fetuses born to mothers who took the drug thalidomide should surely make doctors think twice about prescribing chemically similar drugs to pregnant women.

For a practical demonstration of the Hill Criteria in action, see this recent commentary in the Journal of the American Medical Association. In it, Drs. Sanjay Kaul and George Diamond use the nine criteria to evaluate a study that looked at the association between aspirin and macular degeneration.

Share: