The downside of using big data in medical research

A scientific scuffle played out in the pages of the Lancet recently. At issue was whether a team of scientists led by Dr. Damien Cruse at the University of Western Ontario had successfully used EEG, a test that measures the electrical activity of the brain, to detect awareness in brain-damaged patients who were in a vegetative state, a finding they first reported in 2011.

Other scientists who were working on the problem of awareness had been gobsmacked by the results.

Patients in vegetative states eventually open their eyes. They wake and sleep. But otherwise they have little awareness of what’s going on around them. Some of their reflexes may still be intact but, according to diagnostic criteria, they don’t respond to commands or understand language.

To show that 3 out of 16 of these patients were able to follow verbal instructions to imagine opening and closing a fist or wiggling their toes at the sound of a series of beeps was “pretty impressive,” says Andrew Goldfine, M.D., a neurologist at Weill Cornell Medical College and Burke Medical Research Institute in New York.

Goldfine would know. He’d had disappointing results trying a similar EEG test to detect awareness in minimally conscious patients. Minimally conscious patients are more highly functioning than those in vegetative states.

As it turns out, Goldfine and his colleagues are funded by the same grant as Cruse’s team. They’re part of an effort to find ways to detect awareness and hopefully communicate with severely brain injured patients.

The Americans used their status as competitive partners to ask for the raw EEG data generated by the Canadian team, and they got it.  Such collaborations almost never happen in the highly competitive world of research, but they can be instructive, as this one turned out to be.

In looking at the raw EEG data, Goldfine says he saw very low levels of brain activity punctuated by artifacts, or electrical signals generated outside the brain. Artifacts can be created by tiny movements of the muscles in the face and head. Or they can happen when patients become tired or anxious during long experiments.

Even more unusual, though, was the method the Canadian team used to analyze the EEG data. They didn’t rely on standard statistical tests. Instead, they used on a computer program based on machine learning to comb through tens of thousands of data points—a.k.a. “big data”, which in this case was the electrical fluctuations in the EEG.

Machine learning, Goldfine explains, is like the spam filter on email. It analyzes all kinds of stuff about an email to find patterns that distinguish one kind of message from other. The problem is that if you direct a computer to find differences, it will. But those differences may not be based on things that are truly meaningful, which is also why spam filters snag messages you actually want to read.

“These tools don’t know what they’re doing. They’re taking any data, and they’re classifying it. They’re very subject to false positives,” he says.

Goldfine thinks that’s what happened here. When he reanalyzed the EEG data using standard statistical tests to determine whether their brain activity was significantly different between the task and rest periods, he found no differences among any of the vegetative state patients.

His re-analysis, and a response by the Canadian team, was published in January in the Lancet.

I asked Goldfine how health reporters might have spotted the flaw in the study when even the study’s reviewers did not.

He admits that would have been tough to do. The original data wasn’t published, so reporters couldn’t have run it by an independent expert.

But he thinks reporters and editors should have been more skeptical about the results, which got lots of coverage, despite the fact that the study was on just 16 patients.

“The fact that it was in Lancet implies to the press and public that it’s ready to go,” he says. “In this case, I don’t think it was ready to go.”

The good news is that even though the results were debunked, both research teams are continuing to collaborate on ways to detect awareness in brain-injured patients.  That makes this exercise, and the openness with which it was conducted, a beautiful example of the way science is supposed to work.

2 thoughts on “The downside of using big data in medical research

  1. Avatar photoStephen Beale

    Is this really an application of “big data”? I don’t think there’s a set definition of the term, but my understanding is that it refers to datasets that are so large or complex they cannot easily be analyzed using traditional data processing tools. We’re talking about datasets measured in terabytes or petabytes, or those with extremely complex internal relationships. I have a database with information on 8000 health-related organizations, 8000 hospitals, 3000 U.S. counties, 30,000 cities and 42,000 U.S. zip codes, all linked in various ways. The whole thing is less than 100MB and I can easily analyze that data using Microsoft Access on a desktop PC. You could fit 10,000 databases like that within a single terabyte. Once these buzzwords get into the lexicon, they’re easy to throw around, but I think we have to be careful here.

  2. Avatar photoBrenda Goodman

    Fair point Stephen, thanks for the comment. I didn’t ask how big the data set covered in the study was, but should you or anyone else choose to weigh in here… what do you think? How big is big? And how should we judge that?

    I probably should have focused the title of this post around machine learning, and the idea that letting a computer pick up on “meaningful” patterns can backfire.

Leave a Reply