Module JG130

Data-Driven Journalism

Module author

Curt Chandler

PennState University

Learning objectives After studying this module, you will be able to:
  • Define and explain this journalistic genre;
  • Explain the reasons for this concept;
  • Give an overview of the historic development of the genre, including key persons who established this genre;
  • Reflect this genre critically.
Study point 1
Reading extract Data-Driven Journalism


Why Open School of Journalism believes that Data-Driven Journalism is important zu know 

Data-driven journalism (DDJ) has been around for at least the last five years. Data-driven journalism is journalism that inductively draws upon data to inform news stories. 

Data-Driven Journalism: Why now? 

Open-source software and other advances in analytics and technology have enabled data-driven journalism to enter the mainstream. In addition, the popularity of journalist celebrities like Nate Silver and his work with predictive analytics vis-a-vis recent U.S. elections have brought data-driven journalism more widespread exposure. 

Why is data-driven journalism only now taking off? A large part of the equation are the open-source analytical tools that are enabling journalists to draw meaningful conclusions from huge swathes of publicly-available data. The other part of the equation alludes to the data itself - more open data combined with new media has resulted in more meaningful data stories from data-driven journalists from around the country. 

To distill this down to one simple equation, open source tools combined with open data have enabled data-driven journalists more leverage to visualize and draw meaningful conclusions from previously complicated data sets. 

Process of Data-Driven Journalism 

So, how does data-driven journalism work in practice? Data is extracted by data-driven journalists from websites like then analyzed with open-source software. Freedom of Information laws enable citizens or journalists to glean data from government archives because of a right-to-know legal clause. There's more data than ever today, and data-driven journalists are only recently taking advantage of this influx to create data-driven stories. 

After data-driven journalists collect the data from web sources, the data is then cleaned up using any number of open-source tools for analysis. Google Spreadsheets, for instance, is one available tool at a data-driven journalist's disposal that can be used to clean up the data and arrange it into rows and columns. Journalists can then sort the data based on basic algorithms like age and correlate various variables within the data set. 

Since humans are visual creatures, data-driven journalists then strive to turn the cleaned-up data sets into meaningful graphs. Open-source applications like Many Eyes and Yahoo! Pipes are used by journalists to create charts and eventually help the public visualize the correlations that the data-driven journalists wants attention drawn to. 

Releasing the data-driven story 

Publishing the graphs and implied correlations from the cleaned-up data usually involves the journalist releasing a data story. These data stories are made public through content management systems and normally posted as a single page of charts and statistics. 

A famous example of a completed story vis-a-vis data-driven journalism was the WikiLeaks scandal and Julian Assange's efforts to uncover military reports previously kept from the public. 

Outlets like the Guardian, a paper largely dedicated to data-driven journalism, featured Assange's story and added an interactive map. Guardian subscribers could navigate this map online to pinpoint the location of over 10,000 improvised explosive devices that Julian Assange discussed in his whistle-blowing story, Afghan War Diary. 

Data-driven stories and other forms of journalism 

Data-driven journalists have teamed up with mobile journalists to document riots across Europe in real-time using the fundamentals of data-driven journalism. 

The 2011 "BlackBerry" riots in England, for instance, were reported in real time and made extensive use of new media to inform an international audience of the who, what and where of the events. The Guardian's data-driven journalists then used tables from Google Fusion to craft an interactive map and help Londoners sidestep locations that were experiencing riots and civil unrest. 

The above example is just one instance of how open data can be extracted and analyzed using open-source tools to create meaningful data stories.