Some sources of health data
Beginning in March 2008, the federal government began publicly reporting patient satisfaction measures for hospitals across the United States. They survey tool, known as HCAHPS is the first attempt by the U.S. Department of Health and Human Services to standardize survey results of patients' perspectives of hospital care. AHCJ has made it easier for journalists to compare hospitals in their regions by generating spreadsheet files from the HHS database, allowing members to compare more than a few hospitals at a time, using spreadsheet or database software. AHCJ provides key documentation and explanatory material to help you understand the data possibilities and limits.
2008 Kids Count, from the Annie E. Casey Foundation, is a national and state-by-state profile of the well-being of America’s children that ranks states on 10 key measures and provides data on children and families. Excel files are available.
Health, United States, the CDC's annual report on trends in health statistics, has trend tables available in Excel.
World Health Statistics 2008, from the WHO, has Excel files available for a variety of health indicators.
The Agency for Healthcare Research and Quality has Excel files to use to analyze the quality of health care in your state compared to all states and states in the same region.
The U.S. Census Bureau has data on health insurance, including a comparison of uninsured rates between states using three-year averages: 2004 to 2006.
Trends in health and aging , from the National Center for Health Statistics, has data on trends in the health of older Americans showing data by age, sex, race and Hispanic origin.
The CDC's WONDER database will let you select and export data in tab-delimited files that can then be imported into Excel (here's info to help with importing).
Get data on births from the National Center for Health Statistics' VitalStats Births site.
The Healthcare Cost and Utilization Project is a family of health care databases and related software tools and products sponsored by the Agency for Healthcare Research and Quality. HCUP databases bring together the data collection efforts of state data organizations, hospital associations, private data organizations, and the federal government to create a national information resource of patient-level health care data.
The Pan American Health Organization's Regional Core Health Data and Country Profile Initiative includes databases of basic indicators, country health profiles and the atlas of basic health indicators.
The federal Agency for Toxic Substances and Disease Registry has a number of databases and registries, as well as a mapping application.
The CDC's One-Stop Shop for Environmental Public Health Data provides a reference list of nationally funded data systems that have a relationship to environmental public health.
The National Center for Health Statistics offers downloadable public-use data files through the Centers for Disease Control and Prevention's FTP file server. Data sets, documentation and questionnaires from NCHS surveys and data collection systems are available. Among the data available:
• National Health and Nutrition Examination Survey
• National Health Care Survey
• National Ambulatory Medical Care Survey
• National Hospital Ambulatory Medical Care Survey
• National Hospital Discharge Survey
• National Survey of Ambulatory Surgery
• National Home and Hospice Care Survey
• National Employer Health Insurance Survey
• National Vital Statistics System
• Natality
• Mortality
• U.S. Census populations with bridged race categories
• Linked Birth/Infant Death
• Fetal Death
• National Mortality Followback Survey
• National Survey of Family Growth
• National Health Interview Survey
• National Immunization Survey
• Longitudinal Studies of Aging
••• Do you have suggestions of other sources of health care data? Send them to pia@healthjournalism.org. •••
By Jeff Porter
AHCJ Special Projects Director
This exercise is designed to give you an introduction on using spreadsheets to make some basic calculations to find patterns and trends in health data.
We won't get too technical here – after all, in the term computer-assisted reporting, the last word is the most important: REPORTING. It's important to remember that a spreadsheet can give you a lead, but it can't replace your journalistic decision-making process. But we will need to be able to navigate around spreadsheets, understand some basic references and learn some good practices in dealing with data.
Start by downloading the Excel files we'll be working with here. All of the files are available in this ZIP file: ExcelFiles.zip
Or you can download the individual files:
medicaid_by_county.xls
med_salaries.xls
insured.xls
hospital_cost_per_day.xls
death_rates.xls
cause_of_death.xls
Words used in this tip sheet:
CELL – Start up your spreadsheet software, and you'll see a field of little rectangles. Each individual rectangle is called a cell.
CELL REFERENCE – Consider this as an address, or a location on a paper map. Across the top, you've got letters; down the side, numbers. Combine the two, and you've got an address. Click once on any cell, and you'll find its address just above the first column. You will use cell references, or addresses, a lot.
COLUMN – The columns are the vertical stacks of grids. Each column is designated with a letter or a combo of letters – up to 256 of them in Excel 2003. In 2007, the limit is in the thousands.
CURSOR – In a word processor, it's usually the blinking vertical line waiting for you to type. In Excel, it can look a little different.
There's the Big Plus Sign cursor, a tool to help the user to highlight one or more cells.
Then there's the Skinny Plus Sign cursor, designed to help you copy your work.
Try to avoid this cursor, a collection of arrows. It will actually move the content of your cell, often with unintended consequences.
FORMULA – Spreadsheets were made to add, subtract, multiply and divide. To make those things happen, you'll need to type in a formula in an empty cell. It's pretty simple, but a couple of rules: first, a formula always starts with an equals sign (=). For most of your math, you'll use familiar symbols:
– and /. For multiplication, it's an asterisk (*).
FORMULA BAR – No, it's not a place where babies go after a hard day at the nursery. To the right of the box that shows your cell reference, there's an "fx" symbol, then a blank box. If you click on a cell where you typed a formula, you can review and edit your formula.
FUNCTION – Sometimes, a formula would take a really long time to write. So spreadsheet wizards made some special words to give you a shortcut for your work. Functions always have the same general rules: Always starts with the equal sign, then a special word. One example is SUM. Then inside parentheses, you feed information to the function to let it do its magic. Using the SUM function would look like this: =SUM(A2:A11). Translation: You're telling the function to total up all the numbers in the cells between A2 and A11, inclusive.
HANGING CHAD – When you use the Big Plus Sign cursor to highlight a single cell, there's a small dot or square in the lower left corner. It's where you can use the Skinny Plus Sign to copy formulas and other cell content.
ROW – Series of cells that go from left to right. They are numbered along the side of the worksheet. In Excel 2003, you're limited to 65,536. In Excel 2007, you can have a million.
WORKSHEET – An Excel file can contain several of these one-page collections of columns and rows. While you're working with a worksheet, check the bottom left corner and you'll likely see some additional tabs. Each tab represents a different worksheet, and a click on the tab will take you to that worksheet. It's like a stack of paper with tabs.
Starting with the basics
Open up the file called hospital_cost_per_day.xls. Notice that it's pretty simple: Names of states, then a couple of columns showing a year and the average cost per day during a hospital day by the year. Scroll down to the bottom and you'll find a blank line and then national totals.
Notice, too, that it's listed alphabetically – not very helpful in doing any comparison. So let's do some basic analysis. Before we proceed, one Best Practices principle is make a copy right away, so after opening the file, do a Save As… and give it a new name.
Calculating a dollar difference: Another key principle is that we won't change the original data. Everything we do will involve adding columns or rows or new worksheets. So let's add a column: Under D, let's calculate a basic dollar difference. In D1, give it a name, say, "Difference." Then in D2, the formula: =C2-B2 Then hit the Enter key.
That is, subtracting new minus old to give a difference:


Now, move your cursor over the Hanging Chad until it becomes the Skinny Plus Sign, then double-click. Your formula should copy itself down to the blank row. Notice, however, that it won't capture the United States row; that's because the blank row tells Excel to stop copying the formula. So, whenever you do a double-click like this, always, always check the last row to make sure it copied correctly.
Now, click on D52 and check the formula bar. It should be: =C52-B52. That means Excel is guessing correctly that if you want to copy down the list, the cell references should move as well. Good guess. Now, you can use the Skinny Plus Sign again to drag down to D54 and get a number for the United States. Go back to the cell D53, the blank row, and hit the Delete key. Now, it's all filled in.
Calculating a percentage change: Since each state has a different dollar amount, it's hard to compare the changes. You've got dollar differences, of course, but they all start and stop with different numbers.
One way to level the playing field to a large degree is calculating a percentage change. How? A longtime expert put it this way: Do journalists like math? The answer: NOO. That is, New minus Old divided by Old, or (New-Old)/Old – NOO. So let's add a label in E1 and call it "Pct Difference," then in E2 start a formula to calculate (New-Old)/Old. We already have the (New-Old) previously calculated in column D. So to divide that number by the Old number again, we can simply type in D2/B2. Hit enter and, like before, copy the formula down – you should see a lot of decimal numbers. To pretty up your numbers, click on the % symbol in the toolbar or go Format > Cells… and choose the tab Numbers. Pick Percentage and, for this example, tell it no decimal places. Say OK.
Let's sort: Like most government data, it's not sorted very handily. It's alphabetic, but one question we might want to answer is: What state has the highest percentage increase? You could scan the numbers and eventually find it, but what if you were looking at, say, 500 or even 5,000 rows? Not so easy, then.
So, let's highlight our data – and for the first time, we'll do it the hard way. Click once in cell A1, and hold. Then "paint" the data you want to sort, going down and left in a diagonal move. Be sure to NOT include the United States total.
To sort, use the menu item Data > Sort. Go through the dialog carefully – make sure it guesses correctly about whether the spreadsheet as a "header row" or not. And pick out the column to sort with. For this example, pick "Pct difference" and tell it Descending – that way, the largest number goes to the top. Tell it OK.
You've sorted the data to give a different picture: Which state has the highest percentage increase?

Now, for practice, sort it again. This time, sort by dollar difference and see how the results differ. For an extra tip, here's an easier way to highlight your data to sort: Click once anywhere on your set of words and numbers. Then hold down both your Shift and Ctrl keys and push the 8 key. That should highlight all the data that is contiguous. Note that it does not include the United States total – that 's because of our friend, the blank row.
And last important note: If you get this warning, you're not doing it right, and you're increasing your chance of error! Be sure to select all the data you want to sort before you start.
Fun with functions
So far, we've all our work horizontally – now, we'll do some sum totaling and other magic vertically. Open up the file med_salaries.xls and do a quick Save As… and make a copy.
Let's scroll down to D428 and start making some labels for calculations. Let's call this one Total. In E428, right below the long list of numbers, let's use our first function: SUM. It'll give us a grand total of all those salaries for 2008.
To do this function, you tell it to use the function, and tell it the range of cells to include, like this =SUM(E2:E427). The : (colon) indicates that you want to include every cell between the start and end addresses.
Hit enter after typing the formula, and you've got a total.
In D429, let's add another label called Average and in E429, let's do another function: AVERAGE. The function operates much like the SUM function: =AVERAGE(E2:E427). Cautionary note: Do not include your total in your average. Your stats would be way off. Hit enter and you've got your Average – basically, summing the numbers and dividing it by the number of entries.
Next, let's do a MEDIAN. Label it in D430 and calculate in E430: =MEDIAN(E2:E427). Test questions: What's the difference between an average and a median? And why do the specific average and median differ? Bonus question: When do you use an average, and when do you use a median?
And a bonus function: MODE. It's a function that tells you the value that appears most often in any given list. So in D431, let's label it as Mode, then in E431, calculate it the same way you calculated the others: =MODE(E2:E427)
Caveat: If no values are repeated, you'll get an odd error message, FYI. And most importantly, if there is a tie, Excel will just display the first one it finds.
Filtering to refine your results
Keeping the medical school salaries spreadsheet open, let's consider more comparison. For this example, we'll pose another question: What are the total, average and median salaries of just full professors? No assistant/associate, no emeritus, no deans – just professor.
Well, you could short, then copy and paste just that group, or you can become familiar with a powerful tool with spreadsheets: filtering.
Click just once inside anywhere among the numbers and letters. Go to the menu item Data and pick Filter > Auto Filter. Notice the little boxes appear in each of the labels on the first row.
Click on the box next to the word Title and you'll see a list of all the values, plus a few other options. To explore these tools, let's pick "Custom," and you'll get a dialog box. Let's keep it to "equal" and use the next dropdown to choose the value "PROFESSOR."
Now, tell it OK and you've got a list of just professors. Notice that the little box is now highlighted, and the numbers down the side are no longer sequential. The other rows have not been deleted – merely hidden. Notice, too, that all of your function work is hidden, too. So with any cell highlighted, do your keyboard trick of Shift Ctrl 8 and pick Edit > Copy. Click below where it says Sheet2, highlight A1 and choose Edit > Paste – and you've got it! On Sheet2 (which you can rename with a double-click) you've got just professors. Now, run through your SUM, AVERAGE, MEDIAN and MODE functions again.
More on filtering later, too.
Comparison tools of rates and ratios
What's a rate and what's a ratio? And why do we care?
In simple terms, ratios are used to compare similar units with each other. For example, if you want to count Bob's oranges vs. Joe's oranges, you'd divide the big number over the little number (to make your results easier for the human eye). Say Bob has six oranges and Joe has three. The formula 6/3 results in the number 2, or for every Joe's one orange, Bob has two.
A rate compares two different units. You see it all the time in crime rate stories. Take the number of murders, say, in St. Louis, and divide it by population. It would look something like 138 / 347,181 or 0.0003974. Not a thrilling number. Who would want to read on the air or print in the paper that the homicide rate is 0.0003974 per person? Often statisticians multiply the result to give them a nice, whole number. What to multiply is a matter of choice: the only rule is you use the same number each time. So multiply the number times 100,000, and you get a murder rate of about 40 per 100,000 people.
The big advantage of both, of course, is you can compare numbers more fairly. Instead of using raw numbers, you can use rates and ratios to, for example, compare St. Louis' murder rate with the rate in other cities with different populations. In that example, a rate takes into account both the number of murders and the number of people.
So let's get started: open up the file called death_rates.xls and you'll see three columns: county, population, number of stroke deaths in a given year. Now, let's calculate a rate. As always, do a Save As… and give it a different name. Then add a label in D1 and call it Rate.
Then let's calculate that rate: Deaths / Population is the way to start. So in D2, let's go =C2/B2 and hit enter. What do you get?
Let's tweak the formula until you get a nice whole number. Start with =(C2/B2)*100 and keep adding zeroes until it looks OK. The number 10,000 is the first one that gets a good number – so let's copy that formula down to see how it works. Make sure you get the formula in the Montana, Total row as well. One final item: let's reduce the decimal clutter. Click on the letter D once, and with that column highlighted, use Format > Cells… Make the format number, and tell it no decimal places. (After all, we're counting people.)
On your own, do a sort, just like we sorted earlier and answer: Which county has the highest rate of stroke death, per 10,000 people? And what problems do you see in just rushing this to print or on the air?
To combat one potential problem, let's go back to filter. After you turn on the filter (Data > Filter > Auto Filter), choose population and the Custom option. In the dialog box, pick "is greater than or equal to." Fill in 10000 (no comma) in the box and see how that changes your results.
Now, let's look at a ratio. Open up the file insured.xls. You've got two categories: insured (private or government) and uninsured. Let's investigate some numbers.
As usual, do a Save As… and give your copy a new name. Then in D1, we'll label a new column and call it Ratio. The calculation for a ratio is straightforward but isn't always clear. Remember what we want in our outcome: a single number, a whole number, representing how two numbers relate. In the example of Bob and Joe's oranges, we did the calculation as Big Number / Smaller Number, or 6/3, which turns out to be a 2. That would mean the ratio of Bob's oranges vs. Joe's would be 2:1. In a story about the orange industry, you could express the ratio in a variety of ways: "Bob has twice the orange inventory as Joe. " Or "For each of Joe's oranges, Bob has two." Or "Bob is outgunning Joe 2 to 1 in the orange war."
So looking at the insurance data, in D2, let's go with the same principle: =B2/C2.
Now, consider this question and sort accordingly: Which is worse, a small number or a large number? Based on your answer, do you sort Ascending or Descending?
Finally, if time permits, visit this file: Medicaid_by_county.xls and calculate more rates and ratios for practice.
Tricks and charts
Open the file cause_of_death.xls. Simple and small: General terms of cause, and how many deaths. Let's calculate a percentage of the total. Using our SUM function, let's label A13 as Total then calculate that total in B13.
Now, to calculate a percentage of the total, you simply take the number from the category and divide it by the total you just created. So in C1, let's make a label called Pct of total and in C2 make that simple calculation: =B2/B13 and hit enter. An easy one. Now, let's copy down the formula …

Oops!
Let's diagnose this problem. Click once on C3 and check what the formula says: =B3/B14. The problem is, of course, is that there is no number in B14. Excel blindly followed – for every line down your formula went, so did the assumed location of the total. Most of the time, Excel guesses correctly. This time, it did not. You need to have the first part of your formula, =B2… to move. But you need to tell the second part … /B13 to stay put. The answer: an anchor. We'll use a couple of dollar signs as anchor symbols (Microsoft's favorite symbol, no doubt). You place them in front of whatever you want to anchor, so it'll look like this: =B2/$B$13. You're telling Excel to keep the second part of the formula on Column B and keep it on line 13. So, use that formula in C2 and copy that one down – it should give you the right numbers!
You can use the Format > Cells… dialog box to pretty up your percentages, and check the last number: 1. Then, if you click on C13, you'll see that the total is dividing by itself – so 1 is a little test to make sure your formula is right.
Next, a chart, and these numbers are ideal for a pie chart. Let's carefully highlight just the text and numbers we want.
Start with "Heart disease" and highlight down to the total for "Other." You don't want the column labels, the total or the percentages.
There's a little Chart Wizard button around the middle of the series of icons at almost the top of the screen. Click once, and it'll give you a series of dialog boxes, even allowing you to preview your chart, which is not a bad idea.
Pick Pie for your type, and click Next. A couple of careful points to make sure about during the Wizard: you should always check and make sure it is highlight the correct part of your data, and don't ignore the tabs that let you change appearance. You can give your chart a title, decide whether to include a legend and display percentages aside your pie slices. The best way to learn this is to experiment.
One final note: My personal preference is to put the chart on a separate worksheet. The chart is bigger and it doesn't cover up my data. But try both as time permits.








