Resources: Articles

How USA Today took a deep data dive into the lead-contaminated water story Date: 06/06/16

By Mark Nichols

Nearly everyone now knows how lead contamination in Flint, Michigan’s public water system has jeopardized the health of residents. But fewer people realize that elevated lead levels in public water systems are now a nationwide problem.

I reached that sobering conclusion in early February when I began delving into data from the Environmental Protection Agency’s Safe Drinking Water Information System (SDWIS) website.

In the aftermath of Flint’s water crisis, editors at USA Today asked me to develop and analyze SDWIS data for stories that would take a national look at lead contamination in water systems. Reporters already had begun to cobble together information about incidents of high lead levels in other parts of the country.

We wanted answers to some basic “big picture” questions: Just how extensive is the lead problem? Where are some of the worse cases? What types of water systems are impacted? What types of people are most affected?

SDWIS, an electronic collection of water system monitoring, inspection and violation reports based on data provided by the states, appeared to be the best data source to answer those questions. But after a quick vetting of the database, I realized this analysis was going to be anything but simple.

SDWIS, launched by EPA in the late 1990s, includes dozens of reports on more than 170,000 U.S. public water systems. Issues about the quality of the data, in part due to data entry or reporting errors, inconsistencies and outdatedness of the information, have been well chronicled in reports from the General Accounting Office and the EPA Office of Inspector General. Even the EPA even has acknowledged and warned of data discrepancies on its SDWIS search site.

Armed with a better understanding of SDWIS’ dirty little secrets, I took my first swipe at the data monster.

I started by downloading reports from water systems that had reported “action level exceedances” (ALE) for lead contamination in tap sample tests over the past four years.

A system has exceeded the EPA’s lead standard when more than 10 percent of its tap water samples show lead levels above 15 parts per billion. It's called an "action level" because the water system is then required to take action to reduce contamination.

To set up the download, I had to maneuver through SDWIS’ multiple search filters on its advanced search page.

There’s no HELP link on the page to describe the filters, or guide you through the filter process. But the filters, for the most part, are pretty intuitive. So I followed the “golden rule” of filter searching – cast the net wide – and searched for all system reports containing ALE samples, for all states and EPA’s 10 regions. The SDWIS advanced search is designed to pull reports by quarter, so it took some time to compile a nationwide dataset spanning several years.

After exporting the online records into delimited text files, I used Microsoft Access to create my full dataset of action-level lead test results. (This story provides more detail about how we prepared the data, and some of the problems we found in the database.)

Eventually, the analysis began. And the answers to some of our basic questions were startling. We found that:

  • Nearly 2,000 water systems across the nation had reported at least one test in which lead levels were above the EPA’s action-level standard in the past four years.

  • Those systems, combined, serve an estimated six million people.

  • Nearly a quarter of the systems had exceeded the standard multiple times.

  • About 18 percent of the systems were operated by school or day care facilities.

  • Eastern states appeared to have the most systems with lead problems. Fifteen states had 50 or more systems with a high lead level test; 12 of those states were east of the Mississippi River.

  • The largest portion of systems with lead issues served less than 500 people. Most of those systems were located in areas where median household income was at least $5,000 below the national median.

But while the database provided some story-worthy statistics, it played a more vital role – pointing us to people and places that brought a real-life perspective to the problem.

Places like the Ithaca, New York elementary school we featured, where one lead sample tested at 5,000 parts per billion, which is 300 times the action-level standard and within the EPA’s threshold for hazardous waste. The shocking results led angry parents to demand answers – and changes – from school officials.

Or the trailer park in Corinna, Maine, where a mother and her 8-year-old daughter are drinking only bottled water because they’re unsure how bad the lead contamination is in their neighborhood. The manager of the trailer park where they live told them said to ignore posted notices about elevated lead levels in the water supply.

Read more about our findings and the stories that evolved from them in this USA Today Network investigative report. The package includes an interactive map that displays a summarized version of the SDWIS data we collected.

In the end, I was able to get a lot of mileage out of SDWIS. But the road was pretty bumpy.

Mark Nichols began working in January 2016 as a data journalist on the national desk of the USA TODAY Network. He has worked as a data specialist for the digital reporting team at WCPO-TV in Cincinnati, Ohio, and was the computer-assisted reporting coordinator for the Indianapolis Star for nearly 20 years. He tweets at @nicholsmarkc.