World of Pain: Prescription Drug Patterns

Presented by Frank Bass, director of computer investigations for the Associated Press at the 2007 Urban Health Journalism Workshop. The panel was entitled "Mapping Health Data for Local Stories."

Download an MP3 of Bass' presentation.

Data is helpful, even essential. But often, it’s not enough.

Frequently, analysis is impossible without a visual representation of the data.

Take, for example, a recent Associated Press study of prescription painkillers.

For years, ugly reports had come out about a particularly potent pain medicine called OxyContin. It was so popular in Appalachia that it had earned the nickname “hillbilly heroin.” Aging cancer patients were selling pills to hollow-eyed addicts, who were robbing pharmacies and physician offices to support their habits.

There was very little doubt that the drug was being widely abused. In the midst of our reporting the story, OxyContin’s manufacturer, the Connecticut-based Purdue Pharma, would eventually plead guilty and pay a whopping $600 million fine for misleading patients and doctors about its addictiveness.

A search of the AP archive found hundreds, if not thousands, of brief stories about OxyContin abuse, or “diversion,” as federal drug enforcement authorities liked to call it. Not all had West Virginia datelines, and most went something like this:

LAYTON, Utah (AP) _ A Layton man arrested after an armed robbery at a Layton Shopko pharmacy is being investigated in many of the recent OxyContin robberies in Davis and Weber counties.

My former editor, John Solomon (now at the Washington Post) and I began talking about the story in early summer 2006. We theorized that the use of OxyContin would be like any other abused drug: It would be developed somewhere, then spread. Like heroin in the 1970s or cocaine in the 1980s, OxyContin had to have an origin before spreading out. It was safe to say that Appalachia was the origin: Its population was older, and older people tended to be in more pain. But where had it gone since its introduction in 1996?

The Cleveland Plain Dealer had done an extraordinary (and, I thought, underappreciated) story in 2001 on Ritalin. Using DEA data, it had found rates varied widely around the country. I remembered seeing the story and wondering if usage patterns for other drugs could be developed using the DEA database.

The initial figures were culled from a Drug Enforcement Administration database with the very catchy name of Automation of Reports and Consolidated Orders System (ARCOS). The system gives retail sales figures _ from hospitals, pharmacies and physician offices _ by quarter, for each of the 800 three-digit zip codes in the United States. It consolidates the reports into PDF files, available from 1997 on, for a series of Schedule II drugs. State and federal investigators often use the data to analyze suspicious patterns.

We began in summer 2006 by downloading more than 3,000 pages of PDF report files (http://www.deadiversion.usdoj.gov/arcos/retail_drug_summary/index.html) from the DEA website. The files looked like this:

screen shot

The first order of business was to put the data into a readable format for analysis. We used PDF2XL, a product from CogniView. Anyone who's ever tried to convert a PDF file to any other format knows how time-consuming and painful it can be. Take the above PDF file, copy it, and try pasting it into a spreadsheet. Here's the result:

screen shot

It's possible to use spreadsheet functions to assign columns to the rows and rows of numbers, but there are two issues: One, it would require a lot of manual work or advanced scripting to fix two-word geographic labels, such as "STATE TOTALS, " and two, it could only be done one page at a time. Given that each page would've taken about 10 minutes, we were looking at an extraordinary amount of work to go through 3,000 pages.

For years, the Associated Press had used Monarch's Redwing product. Again, though, it was good for one page at a time. So we purchased a copy of PDF2XL, which allowed us to set templates for the columns and convert hundreds of pages in one fell swoop. The program looked like this:

screen shot

We didn't expect to use Excel for the analysis but decided that it would serve a useful purpose to create sets of uniform files that we could put into SAS for the heavy lifting. Given that we would need a relational database product to join the three-digit zip codes to population figures before transferring it to a mapping program, we used the spreadsheet as an intermediate step.

The output looked like this:

screen shot

I wrote a Visual Basic macro that cleaned up the data nicely, putting the name of the drug in a new column, eliminating extraneous blanks and quarterly data, and getting rid of drugs (such as buprenorphine) that we didn't feel we needed. Ultimately, we decided to expand the scope of the story to include the five most commonly used painkillers by weight: oxycodone, hydrocodone, morphine, codeine and meperidine.

A finished spreadsheet for one substance looked something like this:

screen shot

Of course, we had five substances for each state and the District of Columbia, so that amounted to 255 spreadsheets for each year. And since we had data for eight years, we churned out more than 2,000 spreadsheets.

Our next step was to consolidate the spreadsheets. We could have written a script to pull them into another series of spreadsheets, or even a database. But we wanted to get the numbers into a single file as quickly as possible for a nationwide analysis. First, we imported the data with the 2005 figures for each state into SAS:

%macro dostate(st);
PROC IMPORT OUT= PAIN.&st._OXY05
DATAFILE= "J:Painkillers&st._oxy05.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;
%mend dostate;
%dostate(AK);
%dostate(AL);
%dostate(AZ);
%dostate(AR);
%dostate(CA);
%dostate(CO);
%dostate(CT);
%dostate(DC);
%dostate(DE);
%dostate(FL);
%dostate(GA);
%dostate(HI);
%dostate(IA);
%dostate(ID);
%dostate(IL);
%dostate(IN);
%dostate(KY);
%dostate(KS);
%dostate(LA);
%dostate(MA);
%dostate(MD);
%dostate(ME);
%dostate(MI);
%dostate(MN);
%dostate(MO);
%dostate(MS);
%dostate(MT);
%dostate(NC);
%dostate(ND);
%dostate(NE);
%dostate(NH);
%dostate(NJ);
%dostate(NM);
%dostate(NV);
%dostate(NY);
%dostate(OK);
%dostate(OH);
%dostate(OR);
%dostate(PA);
%dostate(RI);
%dostate(SC);
%dostate(SD);
%dostate(TN);
%dostate(TX);
%dostate(UT);
%dostate(VA);
%dostate(VT);
%dostate(WA);
%dostate(WI);
%dostate(WV);
%dostate(WY);

Since we were working with 50 states, the SAS macro function was invaluable. I could have written a nested macro or two that would have shortened the process considerably, but I worried that I might miss important information by not doing the job sequentially, so I stuck to single macros for each script. Plus, the single-macro script was easy to write and understand. I named it "dostate" and began with a statement invoking the macro:

%macro dostate(st);

Next, I used SAS to import the Excel file. This can be done by point-and-click with an add-on SAS product called SAS/Access, but since I was running a macro, I needed to include the import procedure:

PROC IMPORT OUT= PAIN.&st._OXY05
DATAFILE= "J:Painkillers&st._oxy05.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;

Finally, I ended the macro with the obligatory statement:

%mend dostate;

and instructed SAS to run the macro for all 50 states and the District of Columbia:

%dostate(AK);
%dostate(AL);
%dostate(AZ);
%dostate(AR);
%dostate(CA);

…

%dostate(WA);
%dostate(WI);
%dostate(WV);
%dostate(WY);

That made very short work of an import process that otherwise would have taken days, if not weeks. Next, we consolidated each state's files for the five substances over eight-year period.

libname pain 'j:painkillers';
run;
%macro dostate(st);
data pain.&st._oxy;
set pain.&st._oxy97
pain.&st._oxy98
pain.&st._oxy99
pain.&st._oxy00
pain.&st._oxy01
pain.&st._oxy02
pain.&st._oxy03
pain.&st._oxy04
pain.&st._oxy05;
run;
%mend dostate;
%dostate(AK);
%dostate(AL);
%dostate(AZ);
%dostate(AR);
%dostate(CA);
%dostate(CO);
%dostate(CT);
%dostate(DC);
%dostate(DE);
%dostate(FL);
%dostate(GA);
%dostate(HI);
%dostate(IA);
%dostate(ID);
%dostate(IL);
%dostate(IN);
%dostate(KY);
%dostate(KS);
%dostate(LA);
%dostate(MA);
%dostate(MD);
%dostate(ME);
%dostate(MI);
%dostate(MN);
%dostate(MO);
%dostate(MS);
%dostate(MT);
%dostate(NC);
%dostate(ND);
%dostate(NE);
%dostate(NH);
%dostate(NJ);
%dostate(NM);
%dostate(NV);
%dostate(NY);
%dostate(OK);
%dostate(OH);
%dostate(OR);
%dostate(PA);
%dostate(RI);
%dostate(SC);
%dostate(SD);
%dostate(TN);
%dostate(TX);
%dostate(UT);
%dostate(VA);
%dostate(VT);
%dostate(WA);
%dostate(WI);
%dostate(WV);
%dostate(WY);

Again, it was a deceptively simple script. First, I gave my dataset a name:

data pain.&st._oxy;

Next, I told SAS to combine all state files into one dataset, using the "set" command:

set pain.&st._oxy97
pain.&st._oxy98
pain.&st._oxy99
pain.&st._oxy00
pain.&st._oxy01
pain.&st._oxy02
pain.&st._oxy03
pain.&st._oxy04
pain.&st._oxy05;

I used roughly the same procedure to stack tables together for all states, creating eight U.S. files (one for each year). Then, I used a simple structured query language (SQL) statement in SAS to join the data together for all eight years, and threw in some calculated statements designed to look at year-by-year percentage changes, as well as the amount of change between 1997 and 2005. This gave us our master table (at least, for oxycodone). It was good, but not enough. We could run a quick query to figure out where the most oxycodone was sold:

World of Pain: Prescription Drug Patterns

Recent Posts

Freelancers

About

Training & Events

Awards

Opportunities

Membership

Contact

Advertise with us

Republishing Policy

Corrections Policy

Login

Donate