Finding patterns and trends in health data: Pivot tables in spreadsheets
How and why they can be useful
Jeff Porter and Michael Gibney
Journalists like to count things.
How many flu cases by state? How many deaths from Cause A vs. Cause B? How many violations in each local nursing home?
All three examples are cases of counting — that is, summarizing long lists to show at a glance which state has the highest number of flu cases, which cause of death is highest in number or which nursing home has the most violations. Once you find a pattern, you might have a story.
Open the file called stars_overall_il.xls. You'll see that you have a line for every nursing home in Illinois and several categories, including star ratings in text, the rating in a simple number, location of the nursing home and type of owner.
Now, you could easily find, just by sorting using the column titled AHCJ_rating, to find the nursing homes with the lowest or highest scores. You could further filter that sort to find nursing homes in particular cities, ZIP codes, or owner type. Instead, we're going to do deeper analysis and compare overall ratings for each type of ownership. The possible types are:
Those types are repeated throughout the list of nursing homes.
So to compare the types of ownership to see which types fare best in overall ratings, an ideal tool is a pivot table. It can show the categories and count the number of nursing homes in each kind of categories.
To start, do a single click anywhere in the data where there's text or a number in that spot. Then, in the menu item called Insert (for Excel 2007), pick the far left option — Pivot Table. For the older versions of Excel, choose Data > Pivot Table. The following images correspond to Excel 2007, but instructions are included where the older versions differ.
Now, before you click hurriedly through this, look behind the dialog box. You should see an indicator that your data are highlighted. It's a good idea to use the scroll bars to check all four corners — top left, bottom left, bottom right, top right - to make sure all the data are highlighted. Once you're satisfied, click OK.
You'll next get into an exercise of drag and drop. Pick a column — now called a "field" in the Pivot Table dialog box — and make them rows or columns. For this example, let's keep it simple.
Drag TypeOfOwnership down to the box called "Row Labels" and let go. In the older version of Excel, the box is labeled simply "Row."
Now, we've got to count something. Since each row represents one nursing home, let's keep that theme when we decide how to count the data. The unique identifier for each nursing home is a column called "ProvNum," or Provider Number. So now, drag that down to the box labeled Values (in the older version of Excel, it's called "Data"). Notice, too, that it doesn't simply say ProvNum, but "Count of ProvNum," indicating that it's counting. (Side note: You're not limited to counting; double-click on the Count of ProvNum and choose Value Field Settings and you get a host of options - if it's a number, you can Sum, calculate a Mean and so on. Since you can't sum nursing homes, counting is the right procedure in this exercise.)
When you finish your drag and drop, check out your pivot table. This tells you how many times each ownership type appears in the data. So with your pivot table, you've summarized 794 rows into this compact pair of columns showing that For-profit companies are the biggest owners of nursing homes in Illinois. A far second is the nonprofit corporation category.
Now, look back at the Pivot Table Field List box. (If you lose it, click anywhere inside your pivot table and if it still doesn't appear, choose the Options menu item and push the icon labeled "Field List;" for the older version of Excel, click Pivot Table > Pivot Table Wizard, and click "Layout" again.) Drag DomainStarRating into the Column Labels box — so now you've got rows of ownership type, and columns show the category of the number of stars:
The next-to-the-last column shows the number of nursing homes by category that are too new to rate. To include only nursing homes that are rated, then click next to the DomainStarRating box and uncheck the box by the category "Too new to rate." They vanish! (But not entirely; you can go back, check the box and they reappear.)
The 10 designated as "Too new to rate" are dropped out, and you now have 784 nursing home types to compare.
The counts represent raw numbers, which can be hard to compare between categories. So let's calculate a percentage of the lowest-rating nursing homes, for each category. (Haven't calculated a percentage? Use this exercise .) Before we calculate that percentage, we need to get out of the pivot table mode, though. So do this: Starting with the lower right corner - where you see the number 784, click and hold down the mouse. Be sure to use the cursor that looks like a large plus-sign and not the cursor with arrows. Then drag up and left until you highlight TypeOfOwnership. Let go of the mouse, then use the Home menu item and pick Copy. Then create a new worksheet by clicking on click on the little new worksheet icon at the bottom left. (In the previous version of Excel, use Insert > Worksheet.)
Now, click in the A1 cell once in the new worksheet and choose Home > Paste. (Previous version: Edit > Paste.) Now, we're ready to add a column. Highlight H1 and type in a label, say, "Pct lowest rating" and hit Enter. That will take you to H2, where you are ready to type in the formula for a percentage calculation: =B2/G2. Hit enter, make sure it works, then copy the formula down through row 12:
Next, hit the % symbol in the "ribbon" section to make the numbers prettier and easier to read. In the previous version, you'll go to Format > Cells and choose Percentage, no decimal places.
Now, use your mouse to highlight the column labels, all the nursing home ownership types and all the numbers with the exception of the last row of totals — you don't want to sort your totals. Once you've highlighted correctly, use Data > Sort and find the ownership type with the highest percentage of low-rated nursing homes. Do that by sorting from "Pct lowest rating," Descending. For a detailed tutorial on basic calculations, sorting and other tools, review AHCJ's basic Excel exercise).
As time permits, go to AHCJ's data resources, choose Medicare's Nursing Home Compare database page and download the entire stars_overall.xls file, filter the data for your own state — including Illinois, since the practice file might not be the latest — and see what you can find.
One advantages of using the pivot table tool is that the underlying data are not changed and that a journalist could make numerous pivot tables to find relevant patterns. For more practice, download and make pivot tables using this practice file: deficiencies_fl.xls.
Find the nursing home with the most violations included in the file. You'll need to learn one more trick to ease this process — consider it a lesson of understanding how the data are put together.
The goal is to make a pivot table with a column of nursing home names, and the count of violations. However, nursing home names are not necessarily unique - so for a computer, a Jones Nursing Home in Miami, for example, is the same as a Jones Nursing Home in Orlando. That would wreak havoc in counting violations by name only!
However, there is a distinct identification for each facility - the government's provider number, or in the spreadsheet, the column labeled ProvNum. But that's great for the computer, lousy for humans. So we'll use both. Since the ProvNum is unique, it'll be attached to only one name.
So when you get to the interface to add columns, rows and values (data, in the older version), let's add two rows: ProvNum and NursingHomeName and drop ProvNum again in Values, so it would count individual nursing homes.
But as you can tell, it doesn't look great (left). It does a subtotal for each ProvNum, which is simply a repeated number for the line above the subtotal.
So .... Choose Design under the Pivot Table Options and tell it to not show the subtotals:
In the older version of Excel, use the pivot table toolbar (View > Toolbars > Pivot Table). Click on the down-arrow item labeled "Pivot Table" and uncheck the item "Subtotals."
Now, in sorting the data to find the nursing home with the highest count, the first step is to click on the cell that says ProvNum (if you've followed along the instructions, it should be A4).
Go to Data and pick the Sort icon. You'll get a dialog box. Pick Descending (since you want to find the highest number) then Count of ProvNum (therefore sorting your data by the count). Tell it OK, and you'll get your list!
In the previous version, go to the pivot table toolbar, use the menu item and choose "Sort and Top 10 ... ." Choose Descending and use the field "Count of ProvNum" for your sort.
With these tools in hand, you can use filter tools (for example, by city, ZIP codes or level of severity), copying and pasting into new worksheets, performing other calculations in your analysis.