Computational Journalism in the 21st Century
By the late 2000s, new areas of specialization were emerging. These include automated journalism (having machines produce news content from data with limited human supervision), conversational journalism (communicating news via automated, dialogic interfaces like chat bots), data journalism (using data to report, analyze, write, and visualize stories), sensor journalism (using electronic sensors to collect and analyze new data for journalistic purposes), and structured journalism (publishing news as data).
While some of those specializations emerged relatively independently from one another, they are still centered on interpreting the world through data, and generally rely on computational processes to translate knowledge into data and data into knowledge. As such, they are fundamentally computational forms of journalism, regardless of the amount of technological wherewithal that is actually required.
Computational journalism also aims to blend logics and processes spanning multiple disciplines, such as journalism, computer science, information retrieval, and visual design. With regard to journalism, it involves a significant shift away from the traditional focus on nuance (in reporting), individualism (in subject or focus), and creativity (in writing). Instead, it orients itself toward standardization (in reporting), scale (in subject or focus), and efficiency (in writing). These differences in logics and approaches often make it difficult for editorial and technical actors to work together on computational journalism projects. In fact, researchers have found that when computational journalism projects fizzle or fail, it is often due to the philosophical and procedural differences among members of the team.
Nevertheless, computational forms of journalism have been used to produce highly impactful work in recent years, both in terms of journalistic content and new tools for producing journalism. Several computational journalists (who don’t always self-identify as such) have won prestigious awards for their computational journalism. For example, Jay Hancock and Elizabeth Lucas of Kaiser Health News won a Pulitzer Prize in 2020 for exposing predatory bill collection by the University of Virginia Health System, which had forced many low-income patients into bankruptcy. Hancock and Lucas worked together with an open data advocate to collect and analyze information about millions of civil court records in Virginia — far more than a human journalist could inspect manually. Their reporting resulted in the non-profit, state-run hospital changing its behavior.
On the software side, journalists have worked alongside software development teams to create technologies like DocumentCloud, an all-in-one platform designed to help journalists (and teams of journalists working across multiple journalistic outlets) to upload, organize, analyze, annotate, search, and embed documents. The project brings together existing tools from disciplines like computational linguistics into an interface that is accessible to many journalists. Similarly, MuckRock has made it easier for journalists to make several Freedom of Information Act requests at the same time, write news stories from them, and share the data with other journalists.
Computational journalism demands the same high ethical standards as traditional journalism to ensure that the process of gathering, analyzing, and disseminating information to the public is truthful, independent, and inclusive. However, computational forms of journalism do not always have a distinct code of ethics. This can be challenging as computational journalists tend to place a greater premium on transparency and openness than traditional journalists, which can introduce ethical tensions. For example, some computational journalists have been critiqued as being naive for posting unredacted datasets (that placed unwitting individuals at risk) or not reviewing automated stories (that included misinformation).
It is expected that computational journalism will only continue to grow in the coming years. For example, The New York Times launched a short program to teach its journalists data skills, and the outlet made that course open-source when publishing it online. And, journalistic outlets like BuzzFeed News, FiveThirtyEight, The Marshall Project, and The Washington Post sometimes post the code powering their computational journalism on the code-sharing platform GitHub in order to promote their craft. Moreover, as computers become more powerful and intelligent, automation is likely to become more commonplace — as will the tasks related to translating the natural world into structured data.