Statistical significance measures how likely it is that a research finding occurred due to a real effect versus chance, but whether that finding is actually meaningful for doctors and patients is a separate issue. Clinical significance, also called practical significance or clinical importance, attempts to answer whether a new finding will make a big enough difference to change the way a doctor treats a patient’s condition. While statistical significance is usually measured with P values in clear, objective numbers (even if they have arbitrary cut-offs), clinical significance is more subjective. It relies on clinical judgment and various other factors, such as the condition being treated, side effects of an intervention, details about the patient population, the cost of a drug or intervention, a doctor and/or patient’s comfort with trying something new, and various other risks and benefits.
Any time a journalist is covering a study, they should ask the researcher and other interviewees about the clinical significance of the findings: what are the implications for patients and doctors now or in the future?
Deeper dive
Clinical significance most often depends on whether a drug exceeds the threshold of a minimal clinically important difference, or the smallest effect needed to cause a doctor to change how they manage a patient’s condition. For example, let’s say the minimal clinically important difference for a new pain medication is that it reduces someone’s pain by at least two points on a scale of 1 to 10. Now, consider a new pain drug assessed in a series of studies that uses a pain scale of 1 to 10 to measure its effectiveness. The drug proves to be effective multiple times with a P value of less than 0.01. That means the results are almost certainly a real effect, not due to chance.
However, let’s say the improvement in pain is a change of 0.4 points on the scale. In other words, if a person’s pain is an 8, then taking this medication will have an effect, but it will only drop their pain, on average, to a 7.6. Such a small change in pain is unlikely to make it worth taking the drug, especially if it is expensive or has other unpleasant side effects. If relying on the minimal clinically important difference of 2, it doesn’t meet the threshold. How that threshold is determined varies. For a condition that is excruciatingly painful, even a slight reduction in pain from a new drug may be worthwhile. But for another condition that is only mildly painful, a patient might barely notice a slight reduction in pain.
Although clinical significance does not have a single, objective measurement tool, there are several objective numbers that can contribute to clinical judgment. Two are Number Needed to Treat (NNT) and Number Needed to Harm (NNH). If a drug has a big effect and is statistically significant but 100 people must be treated for just one of them to experience that benefit, then a doctor may be more likely to stick with a drug that is slightly less effective but works for more people. Switching to a new drug might risk having too many people who get little benefit. Or, if a drug is very effective and works for a lot of people (say one in every two people benefits, an NNT of 2) but harms one of every four people who uses it (NNH of 4), then a doctor may determine the benefit isn’t worth the risk for a group of patients.
Confidence intervals are another objective measurement often used in assessing clinical significance. They provide the upper and lower ranges of the effect likely to occur 95% of the time. If the range is narrow, the doctor can be more confident about the expected effects of the treatment. If the range is wide, the extent to which the drug works for a patient becomes less predictable. For example, let’s say a new drug for psoriasis reduces the number of flare-ups by an average of 10 per year. If the confidence interval is between 8.2 and 11.4, a doctor can be fairly confident that a patient taking this medication will have 8 to 11 fewer flare-ups in a year.
However, if the confidence interval is 1.4 to 20.1, then some patients may see a huge benefit (20 fewer flare-ups) while others see very little benefit (just two fewer flare-ups). Or, we consider the pain drug example above, a confidence interval of 0.1 to 4 might make the drug worth trying if some individuals will experience a reduction in pain from an 8 to a 4 — that’s a 50% reduction in pain.
Finally, comparing relative risk and absolute risk may offer clues to whether a finding should be regarded as clinically significant. If, for example, taking antidepressants during pregnancy increases the likelihood of a specific birth defect by five times, that sounds pretty frightening. But if the birth defect in question only occurs in one in 1 million babies, then a fivefold increase in risk means that five in 1 million babies will experience it, perhaps not enough to justify telling a pregnant person not to take it.
As illustrated with that example, clinical significance can also relate to harms. If a risk from a procedure is blood loss, but the average blood loss amounts to an average of 70 mL with a narrow confidence interval, that’s not enough blood loss to pose a serious risk. Sometimes a research finding may not offer much clinical significance because it’s simply not ready for prime time: the findings were conducted in animal studies, can’t yet be generalized to a broader population or do not yet have enough evidence to support them from multiple studies. If a study finds that eating apples during pregnancy is associated with a higher risk that the fetus will later develop ADHD, for example, much more research is needed before doctors start telling pregnant women not to eat apples. The finding may be interesting and statistically significant, but not yet clinically significant.
For additional reading on how researchers and doctors consider objective measures of clinical significance, read this review paper.