Journalistic sources are generating huge sets of data with their minute-by-minute updating of news and information through their publications, websites and mobile platforms. Both the words and pictures of the stories themselves along with all of the “meta-data” about those stories (when and with what equipment a photo was taken, what part of a website a story was published on, how many versions of the story were created before it was published, etc.) are funneled into databases through the work that journalists are doing each day.
Also, journalists make other contributors’ data sets accessible and understandable. For example, when the Centers for Disease Control and Prevention issues a data set about the frequency and location of incidences of a particular illness, those data are used by journalists to write a story and put the data into context. School “report cards” are another kind of data, generated by public-sector institutions that journalists will take and make more accessible. Compare, for example, the school report card data from by the Oregon Department of Education and the interface the Portland Oregonian published.
Journalists gather data generated by all the other information contributors as part of the reporting and interpretation work. For example, a news organization might gather public-sector data about the drunk driving convictions in the state and cross-reference those data with the airplane licensing data to see how many people who have been convicted for driving while intoxicated also have a license to fly a plane. When one news organization did this, it discovered an alarming amount of overlap, including several individuals who were licensed to fly commercial aircraft for the hometown-based major airline!
Journalists are also becoming expert at using large data sets to develop info-graphics or data visualizations to help make the data more understandable to the audience. For example, maps are always popular with readers/viewers. There are several easy-to-use interactive visualization tools that help journalists build interesting info-graphics for their digital sites from data they have gathered. Journalists have used data sets from a variety of contributors (but mostly public records) to create maps such as:
bikeshare rides in Boston
pedestrian injuries in San Francisco
green roofs in Chicago
rat sightings in New York City
tsunami sirens in Honolulu
dangerous dogs in Austin
So one source of data from journalistic sources is all of the news reports, investigative accounts and other types of interpretive work that journalists do with other organizations’ data. Using any of the search databases that incorporate news content to locate these types of reports will uncover such journalistic work. Most news organizations will highlight these types of reports on their own websites as well.
Web analytics from publishing platforms comprise the other type of data generated by journalistic sources. Every news organization with a digital platform for disseminated content uses the “back-end” system of that platform to gather second-by-second information about how that platform is being used, and by whom. As an individual news consumer moves through the site, there is a data trace of each click, every ad that is opened, each link that is followed, how long the reader stayed on the site, whether the user shared that information through a social networking site, and similar actions.
News organizations are learning how to use these data to fine tune their content. For example, if a news editor sees that a particular story on the site is getting a lot of clicks and shares, the editor might move that story to the homepage or send a Tweet to followers with the link to that story to generate more views. The advertising sales staff for those news organizations refer to the user data when trying to convince an advertiser to buy an ad on the news website.
While some traditionalists bemoan the use of these types of data to make news judgments, the reality is that it is now commonplace to do so, and any news organization that isn’t using these “back-end” data is missing a chance to improve the reach of their news content.