Problems found in initial federal data on COVID-19 in nursing homes

Photo: The National Guard via Flickr

When Seema Verma, Centers for Medicare & Medicaid Services administrator, announced June 4 that she and the Centers for Disease Control and Prevention were unveiling COVID-19 data for all the nation’s nursing homes that get federal payment, I thought, “Wow!”

These days, how states are reporting their nursing home COVID cases is varied and random. So this new “unprecedented” federal dataset, “constitutes the backbone of a national COVID-19 virus surveillance system,” Verma said.

Nursing homes now are required to report this data directly to the CDC weekly, under penalty of fines from $1,000 per week of delay and higher. The data does not go through states or counties, which have different reporting requirements and ways of counting cases, Verma said. The goal is to enable an apples-to-apples comparison of the data, and as of May 31 more than 88% of the nation’s 15,000 nursing facilities had submitted their numbers.

But there were clear problems with this inaugural data release. As the cells in my Excel sheet filled up with the download, there were 56 columns of data, including the number of suspected and confirmed COVID-19 cases and deaths in residents, and the same data for staff. The nursing facilities’ supply of gloves, masks, ventilators and other equipment, and its access to laboratory testing, also are noted. There also is a column for the number of beds.

One can sort for the facilities with the most cases in each column, which is what I and many other journalists did. For example, “Residents Total COVID-19 Deaths.” I then checked back with the CMS database for that facility to make sure I had sorted the data accurately. And then, just in case, I took screenshots. The story ran on Friday.

That’s when the proverbial you-know-what hit the fan. Nursing home officials were shocked and appalled, and one representative wrote MedPage Today’s editors to ask how we could publish such “insanely wrong” data.

We obviously needed a second story. So I did what it appears CMS and CDC failed to do, and contacted as many outliers as I could ― especially facilities with the highest number of cases relative to their reported number of beds ― to ask whether the dataset was correct for their facilities.

No, no, no, no, no and no, came the replies. Many facility administrators or representatives said the data was not what they had submitted. Others said they hadn’t understood some of the instructions and inserted some numbers into the wrong columns.

The upshot of releasing the data so fast was to frighten residents, alarm family members, and take time away from patient care during a pandemic to answer reporters’ calls.

Paula Sanders, an attorney who represents LeadingAge and the American Health Care Association, said the dataset “has destroyed the trust between the facilities and the families because they’ve been reporting and telling the families, these are our numbers. Then these numbers come out and don’t make any sense at all. Unfortunately, some families are going to believe the government over the facilities.”

Sanders added the federal government was in too much of a rush to publish, apparently because someone had set an unrealistic deadline for rollout of the dataset for June 4, only a few days after the May 31 deadline for facilities to report.

To be fair, Verma said during the press call warned that there would be some inaccuracies. She said that instead of inserting cases for each period as they occurred, some nursing home facilities inserted a cumulative number, which the algorithm apparently added as a cumulative number.

But the sense was that the data managers’ quality assurance process had caught and corrected those errors. A technical expert said during the call that CMS had even held back on reporting data for 3% of the facilities because of apparent errors.

So why didn’t they wait another few days to check the data further?

There appear to have been no alert system to highlight a nursing home with more than, say, two to eight times the number of cases or deaths as beds, making it appear as if residents were dying of COVID-19 as soon as they were admitted. There also was no system by which the CDC or CMS would notify each facility about how their data was going to appear when the agency clicked “publish,” according to many facility representatives I interviewed.

I asked CMS how they could have gotten so much wrong, and got this response on Monday:

“As with any new reporting program, there can be data submission errors in the beginning. To be transparent, CMS made the data collected by the CDC public as quickly as possible, balancing transparency and speed against the potential of initial data errors.”

“CMS is advising nursing homes when their submitted data has not passed certain quality checks, so they can review the CDC submission instructions and their data submission for accuracy. As CMS continues to analyze the data going forward, we expect fewer errors as nursing home staff get used to these requirements and CMS has more time to quality check the data.”

Asked why CMS, at the very least, did not contact the highest outliers, for whom such large numbers of COVID-19 cases or deaths were highly unlikely because of their size, the spokesman did not respond.

Reporters at some publications said their editors held off reporting on this new transparency effort until the meaning and accuracy of it could be clarified.

Maybe that’s what CMS and the CDC should have done too.

Leave a Reply