Resources: Articles

Document management systems can help find the story in thousands of pages Date: 04/14/16

By Audrey Dutton

After a number of media companies, including my paper, the Idaho Statesman, successfully sued for court files improperly sealed during a federal trial against a local hospital system, I went home with a thumb drive full of digitized documents. A beautiful vision greeted me when I plugged the drive into my computer: hundreds of internal memos, emails, text messages, board presentations and spreadsheets.

I knew these documents held secrets. I knew they'd help our readers better understand the inner workings of health care. But I didn’t know how to organize them and make sense of hundreds of PDFs, especially without the context lawyers provide in a courtroom explaining their importance. So, the pressing question I faced was this: How would I organize these documents into a relevant story or series of stories and still have a life?

The answer came in a web-based program for journalists called Overview. Originally developed at the Associated Press, creators of program are working to make it a sustainable business. On the web, the journalists, engineers and software developers behind the site, say this, “Overview helps you read and analyze thousands of documents super quickly. It includes full-text search, topic modeling, coding and tagging, visualizations and more. All in an easy-to use, visual workflow.” Many news organizations have used it to mine documents for all kinds of stories.

For me, it took just minutes to open an account and begin uploading my loot.

I tried to run Overview's sorting mechanism to drill down into the text for interesting nuggets.

Unfortunately, many of the documents I'd received were fuzzy photocopies, scrawled notebook pages or illustrations — as opposed to text that Overview could read and sort easily. So I worried I'd miss a lot of content this way.

But then I noticed: Overview has a life-saving tagging feature. I went through documents one by one. Whenever I noticed a theme, I added a tag.

This made it easier to find documents months later. And I could bookmark the "smoking gun" documents (with the ***** tag) that I knew I'd probably use in stories. I used an "ignore" tag to weed out documents that clearly had no news value.

As we prepared articles for publication, tagging even helped with creating graphics. I added a "data" tag to documents with insurer reimbursement tables. I later retrieved them to build charts showing the difference between charges and reimbursements at two hospital systems.

Overview lets you search, and it does a good job of finding words, as long as they're computer-readable. The combination of tagging and searching allowed me to quickly answer my editor's questions about who said what to whom and when.

The trial documents were released in a few batches, including a giant binder of hard copies. So I did some old-fashioned reading and note-taking, too. But this system made the whole project move a lot more quickly and smoothly than it would have otherwise.

Beginning on May 3, 2015 and running through Oct. 30, we published these six stories:

For posting the documents on our website, I used DocumentCloud, another tool for journalists.

Some final notes: Overview has several tools that I didn't use for this project. But if you're working on a documents-based project, they might help, so go check out the website. They're constantly improving and adding tools. And even though Overview was free, the staff there responded quickly by email when I had questions.

Audrey Dutton (@IDS_Audrey) is the business reporter/investigative coordinator for The Idaho Statesman in Boise.