Understanding statistical bias when covering the use of genomics in medical studies

Rapid advancement in genomics —understanding, identifying and mapping genomes, or the DNA characteristics of a person or a tumor — has had a tremendous influence on cancer research, treatment and drug development. Genomics is the reason why most researchers see precision medicine as the future of all medicine. But as always, the devil is in the details. When not handled appropriately by researchers, use of genomics in medical studies can also introduce bias. A November 2021 study in JAMA Oncology demonstrates how.

Why should health journalists care about the particulars of studies that combine clinical and genomic data? Well, not all researchers using this data recognize its limitations. As journalists reporting on medical studies, it’s our job to be skeptical about study methods and try to understand how those methods might inadvertently lead to misleading or incomplete results. While you may not need to understand the details of this recent JAMA Oncology study, it is crucial to keep these key points in mind:

When patients undergo genomic testing can vary in relation to their diagnosis dates or start of treatment.
That variation can affect the accuracy of survival statistics.
When reporting on studies about cancer survival or the use of genomic and clinical data together, reach out to a biostatistician for a gut check on the study’s statistical analysis.

The JAMA Oncology study focuses on researchers ignoring the bias that can result from “left truncation.” Left truncation is a type of selection bias where people who would otherwise meet a study’s criteria aren’t included because they’ve already reached the outcome the study is investigating. For example, studies on miscarriage recruit women who know they’re pregnant, but a substantial number of miscarriages occur before a woman knows she’s pregnant. If a large proportion of women enter a study on miscarriage after the gestational age when miscarriages occur most frequently, the study runs the risk of underestimating miscarriage risk unless the authors adjust for the women who weren’t included in the study because they had already had a miscarriage.

The authors of this study note that the “lapse between diagnosis and molecular testing can present analytic challenges and threaten the validity and interpretation of survival analyses” in cancer studies that use genomic data. They provide an example using the Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative (GENIE BPC) of the American Association for Cancer Research (AACR). Patients were enrolled in the GENIE BPC if they underwent genomic profiling from 2015 to 2017. But the time that had passed from their diagnosis until their genomic profiling varied. Some may have undergone profiling soon after diagnosis, while others may have received first-line or even second-line treatment before undergoing genomic profiling months later.

That time lapse becomes important when researchers determine when to start the clock on calculating survival. Starting it from time of diagnosis versus time of genomic profiling (and any subsequent treatment decisions based on those results) can lead to very different survival time estimates. The authors demonstrate how not adjusting for left truncation resulted in median survival estimates that overestimated survival by over a year.

Now, do you need to understand every word I just wrote for this blog post to be valuable to you? No. In fact, the less sense it makes, the more important it is for you to follow the advice that Ivan Oransky, former president of AHCJ, has given at every AHCJ conference I’ve ever attended: Always keep a biostatistician in your back pocket. If you consult an outside biostatistician every time you report on a study involving survival analysis or combining genomic and clinical data, you’ll be fine.

Here are some sample questions to ask:

Did the researchers adjust for all the things they needed to adjust for to get the most accurate results?
Are there any biases introduced in the study’s methods that could substantially affect its findings or the authors’ conclusions?
Did the authors’ conclusions follow accurately and logically from what they found?
Could some aspects of the findings have been overestimated or underestimated from a different type of statistical analysis or adjustment? If so, would that have a statistically significant or clinically significant effect on the findings?
Are there any statistical red flags here?

Here are some places to find biostatisticians:

STATS Check’s explicit purpose is to help journalists understand and accurately report on statistics
The American Statistical Association
The International Society for Clinical Biostatistics
The media offices of almost any university or medical school