Tag Archives: datasets

New data section highlights common large datasets used in studies

We are well into the age of Big Data, in which researchers may use databases or another dataset with data from tens of thousands or even millions of individuals.

These massive datasets have many advantages, such as the ability to narrow down a specific population through inclusion or exclusion criteria, having adequate participation to achieve statistical power, being able to analyze and compare subgroups based on demographics or other differences and the ability to get diverse, representative populations. Continue reading

Be wary of studies using big data: Follow these suggestions

Looking for p-hacking or other statistical red flags is challenging, particularly for journalists who don’t have training in statistics or medical research design or access to the complete data sets a researcher may be using. But that doesn’t mean you can’t learn a few tips on how to scrutinize studies that analyze huge datasets. In fact, three statistical editors of JAMA Surgery — Amy H. Kaji, M.D., Ph.D.; Alfred W. Rademaker, Ph.D.; and Terry Hyslop, Ph.D. — recently penned an editorial aimed at researchers that journalists can benefit from as well. Continue reading

Google charts health data from CDC, World Bank

Google has removed another step between people and information with the release of its new Public Data Explorer. It’s a service through which Google links neat, tidy and reputable sets of data with a beefed-up version of its chart programs.

Right now it’s limited to 13 data sets, though Google implies that it will continue to expand those offerings based on demand. Those data sets include three that are powered by the CDC’s WONDER data delivery platform.

Data from the World Bank includes international numbers on things such as fertility rates, births attended by skilled health staff, rates of immunization against measles, prevalence of HIV, life expectancy and more. You also can find statistics on the U.S. population from the Census Bureau.

At present, the limited selection mean that it probably won’t be useful for more than a handful of stories, but it’s something to keep an eye on as Google continues to add data and customization options.

Here’s a quick example mapping U.S. cancer rates (circle color) and number of cases (circle size) by state.

NOTE: If you can’t see the visualization, you’ll probably need to upgrade your browser.

(Hat tip to ReadWriteWeb)

HHS releases FOIA report in less-than-ideal format

Bob Garfield of WNYC’s “On the Media” talked to John Wonderlich, policy director at the Sunlight Foundation, about last week’s announcement of the Open Government Directive.

Wonderlich says the initiative “is the administration making a real commitment to systemic change within the government.” He also brings up the issue of how information will be made available, pointing out that spreadsheets and datasets are more valuable than paper records to journalists as well as other businesses.

He points out that government agencies report each year on how well they are responding to Freedom of Information Act requests and says that last week – for the first time – the Department of Justice released that information for 2008 in spreadsheets.

Unfortunately that’s not quite the case. The reports from most nearly all of the departments are in spreadsheet form but a few, including the report from the Department of Health and Human Services, are in other formats that may be more difficult to analyze.

There is, however, a bit of good news. The 2007 report from HHS showed that there were more than 28,000 pending requests. The agency has made an effort to reduce its backlog and the 2008 tally is just more than 19,000.