Using DocumentCloud to shine a light on sources #ahcj13

Photo by Pia Christensen

“Turn documents into data,” is the tagline for DocumentCloud, a free Web tool for journalists who want to search, analyze, annotate, publish and publicly share certain primary source documents.

Mark Horvit, executive director of Investigative Reporters & Editors, which operates DocumentCloud, gave an hour-long workshop on what the tool does and why it’s useful for journalists on Thursday, the first day of Health Journalism 2013.

Basically, the site allows journalists to take PDFs – the largely unalterable file format of most government documents – and make them as easy to work with as a normal Microsoft Word document.

Horvit said the goal is to put public documents in the hands of everyday people and in a useful way.

“We talk a lot about transparency as journalists,” he said. “DocumentCloud provides transparency in two different ways. First it allows you to help government and business be more transparent by taking their documents, posting them and making them available to your audience. But it also provides transparency for the work that you do. In other words, readers no longer have to just take your word for it. You can post all the source documents for your story, so that your audience has a chance to see what you got.”

Among the features of DocumentCloud that Horvit highlighted: 

  • Upload written documents in any type of file format, from court filings, hearing transcripts, testimony, legislation, reports, memos, meeting minutes, correspondence and more.
  • Search for key words within those documents
  • Annotate and highlight sections of a document so you and your readers can easily navigate it.
  • Take dates within a document and plot them on a timeline.
  • Share documents publicly or keep private
  • Securely redact sensitive information
  • Embed a link to the document on your website

According to Horvit, the website boasts roughly 545,000 documents representing more than 7.7 million pages, all of which have been uploaded by some 750 organizations. Among those are major news outlets including The New York Times, ProPublica, the Los Angeles Times, the Chicago Tribune and USA Today.

Those media organizations have used DocumentCloud to publish the U.S. Supreme Court ruling on the Affordable Care Act and emails from former Alaska governor and vice presidential candidate Sarah Palin, among other things, Horvit said. Others have used it to privately store parts of the Wiki Leaks trove of classified government documents.

Horvit said Document Cloud was originally funded by a two-year grant from the Knight Foundation. It was first conceived by two editors at ProPublica and a New York Times reporter. In 2011, Investigative Reporters & Editors, a nonprofit journalism organization, took it over.

Only approved journalists and news outlets may upload to DocumentCloud and no one can post anonymously, he said. But anyone can browse and search the public document archive.

Later this year, Horvit said, DocumentCloud will begin letting members of the public annotate documents that journalists post publicly. The site will also start offering documents in other languages.

For more information visit or email Mark Horvit at

Leave a Reply