Be wary of studies using big data: Follow these suggestions

Looking for p-hacking or other statistical red flags is challenging, particularly for journalists who don’t have training in statistics or medical research design or access to the complete data sets a researcher may be using. But that doesn’t mean you can’t learn a few tips on how to scrutinize studies that analyze huge datasets. In fact, three statistical editors of JAMA Surgery — Amy H. Kaji, M.D., Ph.D.; Alfred W. Rademaker, Ph.D.; and Terry Hyslop, Ph.D. — recently penned an editorial aimed at researchers that journalists can benefit from as well.

Their freely available “Tips for Analyzing Large Data Sets From the JAMA Surgery Statistical Editors” includes several gems for journalists looking for possible flaws or limitations in studies.

First, researchers should have determined their study’s objective(s) and the outcomes they will measure before gathering and analyzing data. Doing the reverse is an easy way to end up in p-hacking territory, as Cornell nutrition researcher Brian Wansink is accused of doing. Researchers should have registered their clinical trial at clinicaltrials.gov before beginning it, so journalists can look up whether their objectives, endpoints or outcomes changed during trial. Epidemiological (observational) studies, however, may not have been pre-registered, so you may need to ask the researcher if they ended up reporting the same objectives and outcomes they planned to report.

The JAMA Surgery editors then ask researchers to explain to readers how and why the researchers chose their population. With large datasets, researchers usually apply inclusion criteria to select populations, so an explanation of the criteria and rationale should be in the study. They should also provide a flowchart revealing who was excluded and why; if a flowchart doesn’t appear in the paper, ask why.

The editors recommend that effects reported in the study be “clinically meaningful” and “patient-centered,” so look not only for statistical significance but for clinical significance as well. Also be aware of how large datasets affect statistical significance. Typically, the narrower the confidence interval, the more reliable a finding tends to be. But that loose rule of thumb may not apply if a dataset is so large that its size alone could make almost any correlation statistically significant.

“Unfortunately, mining large data sets without preplanning can lead to unintentional, often mistaken conclusions,” the editors wrote. “Statistical significance is related to sample size, and with a large enough sample, statistical significance between groups may occur with very small differences that are not clinically meaningful.”

But the opposite can happen to: “Study samples may be inadequate to answer questions about rare outcomes,” so researchers should have determined what sample size they needed to achieve adequate statistical power ahead of time. Good researchers will describe this decision process in their methods section. Even if you don’t entirely understand what the power analysis means, the fact that they have one is important, and you can always ask an outside source about the reasoning.

Researchers should include any interim analyses — calculations performed partway through the study — and why, as well as any violations of the study protocol and the reasons, so look for these or consider asking about them. If the study includes multiple comparisons or tests — especially ones not initially planned — the researchers should adjust their data to account for that.

“If there are more than 20 tests performed, then by chance, one will be statistically significant,” the editors note. “One strategy is to employ methods of correction (eg, Bonferroni correction, Hochberg sequential procedure) when the number of tests or comparisons exceeds 20.” If you see 20+ outcomes, look for mention of a correction method, even if you don’t know exactly what it means.

The authors also recommend that enough data be included that a reader can make their own calculations. This is a great reminder for journalists to look for absolute risk if only relative risk is reported. Often, journalists can derive absolute risk from tables, but if the data are not there to do that, ask about it.

“Limitations should be reported to promote scientific integrity and validity of conclusions, which should be fully supported by the data analysis,” the JAMA Surgery editors wrote. “Interpretations of observational studies should only lead to descriptions of associations between variables, not to conclusions of causality.” Although most people should be aware that correlation does not equal causation, the fact that medical journal editors are reminding researchers of that should indicate how important it is to look for exaggerated claims.

The bottom line is best summarized by the editors themselves: “Large data sets have many unique strengths, including broad representation, efficient sampling design, and often consistency in data structure,” they wrote. “However, large data sets are not free from bias and measurement error, and it is important to respect and acknowledge the limitations of the data. The challenge with big data is that it requires a carefully thought-out research question and a transparent analytic strategy.”