How to find reliable COVID-19 data

One of the most challenging aspects of reporting on the pandemic has been accessing reliable, accurate data about COVID-19 and its impact on Americans. The need for trustworthy, real-time data has caused a few journalism and nonprofit groups to create repositories to pull together data from varying sources.

A Thursday morning session at Health Journalism 2022 in Austin, “The quest for COVID-19 data: Where “official sources” fell short and journalism stepped in,” focused on these efforts and provided journalists with a wealth of resources for up-to-date data related to the pandemic.

Most high-income countries have national health care systems, so data collection and collation is far more straightforward than in the federalized U.S. health care system, where a mix of private and public payers are governed by national and differing state laws. Without a national registry or centralized healthcare system, it’s been harder to track statistics on COVID cases, hospitalizations, deaths, vaccinations, and other relevant numbers.

Hence the creation of Documenting COVID-19, a public-records repository with nearly 300 record sets and more than 100 investigative stories published with different partners since March 2020. The project team includes journalism fellows and Columbia University journalism and data science researchers funded through grants from MuckRock and Columbia’s Brown Institute for Media Innovation. The Documenting COVID-19 project pulls together internal emails, memoranda and health metrics from local and state governments, especially health departments, school districts and governor’s offices to create the repository.

One component of Documenting COVID-19 is the Uncounted project, focused on investigating underreported COVID-19 deaths and other excess mortality during the pandemic, collected in this repository.

“Excess deaths are the final COVID indicator,” said Derek Kravitz, a panelist at the session and investigations and data editor with MuckRock. “A death certificate is the final piece of info that a person leaves behind, but it can take weeks for deaths to make it into the public record.”

Officially, as of April 29, 2022, there have been 990,208 deaths from COVID-19, according to CDC. Yet, there have been 1,101,793 excess U.S. deaths from 2020 to the present, Kravitz said, including 217,744 deaths not attributed to COVID-19.

“When we say we’re approaching one million COVID deaths, we probably passed that number months ago,” said Besty Ladyzhets, a panelist and a science, health and data journalist (Documenting COVID-19) with MuckRock.

But it’s impossible to disentangle how many of those latter deaths are COVID-19 deaths not identified as such on death certificates versus deaths from other causes such as people unable to get care during hospital surges, overdose deaths, and other pandemic-related deaths.

Waste-water data has emerged as one useful metric during the pandemic, but its use varies substantially across the U.S., and it’s been difficult to determine how to interpret the data from wastewater. This pandemic marks the first time the U.S. has used wastewater surveillance for a disease, despite its use for diseases like polio in other parts of the world. It’s also the the first  it’s been used for respiratory disease (COVID-19). That means places that only began tracking wastewater data later in the pandemic, such as during the Omicron wave, are going to have different baseline levels than places that began tracking during Delta or earlier.

Another challenge with wastewater data interpretation involves the variations that occur, particularly in smaller towns where adjustments are needed for tourist bumps and/or migrant farmworker populations, for example. Effective, informative wastewater surveillance also requires comparison with case counts — just as the CDC has shifted its focus from cases to hospitalizations.

Long COVID represents another gap in data, Ladyzhets said. The United Kingdom began tracking how many of its residents had long COVID symptoms early in the pandemic, but the U.S. never began such tracking. One reason for this is the lack of clear criteria in determining what constitutes long COVID, but the U.S. still could have attempted data collection on specific symptoms that persisted after infection. The National Institutes of Health has initiated the RECOVER study for long COVID, but the lack of even fuzzy baseline data about U.S prevalence of the condition means they don’t know how many people to recruit or what demographics to recruit, Ladyzhets said.

”We need national standards on data collection,” Nsikan Akpan, a panelist and a health and science editor with New York Public Radio, told attendees,. But it’s not clear if or when that might occur. The most likely path to the kind of standards needed is a national health care system, which is politically unlikely at the moment. Until that becomes available, the panelists provided a COVID-19 data tip sheet for journalists. Here are the resources they highlighted during the session:

