Open Government Data: The Book

By Joshua Tauberer. Second Edition: 2014.
Also available as a Paperback and for Kindle. Tweet me at @JoshData.

Open Government, Big Data, and Mediators

Big data

As technologists in the early 2000’s were getting involved in politics and creating added value on top of digital government services, a much broader technological change was happening in other fields: the advent of Big Data. Dana Boyd and Kate Crawford (2011)1 described Big Data:

Big Data not only refers to very large data sets and the tools and procedures used to manipulate and analyze them, but also to a computational turn in thought and research (Burkholder 1992). Just as Ford changed the way we made cars — and then transformed work itself — Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community . . .

It re-frames key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality. Just as du Gay and Pryke note that ‘accounting tools…do not simply aid the measurement of economic activity, they shape the reality they measure’ (2002, pp. 12-13), so Big Data stakes out new terrains of objects, methods of knowing, and definitions of social life.

In other words, Big Data has two parts: 1) Big Data is data at scale, with millions of records and gigabytes of data, and 2) Big Data changes the way we think about the subject of the data in a significant way.

Open government data, you might say, is the Big Data concept applied to open government. First, it is the application of government records at scale. Open government data applications make use of whole datasets to make comprehensive use of information: not one record but the whole database, not one agency’s rule-makings but the whole Federal Register, not the weather in your neighborhood but the weather anywhere in the world. A larger database with a wider range of information makes an application useful to a wider range of users, and it provides something for the long-tail of individuals with fringe interests who might not otherwise be served. There are thousands of bills being considered in Congress at any given time, and there is something for everyone — from agriculture to medicine and hundreds of issue areas in between. Data at scale also gives perspective. When a journalist reports that a certain Member of Congress has missed 10% of votes, is it a lot or a little? When plain-language advocates call for simplified language in laws, how can you know whether it makes sense without being able to survey a wide cross-section of law?

And then there is the second part of the definition of Big Data. Open government data differs from conventional open government policies in the same way that “data” differs from “information” or “knowledge.” The open government movement of the second half of the 20th century relied on the disclosure of records, such as who is paying who, who is meeting with who, and records of government decisions and findings. The Freedom of Information Act (FOIA) (typically called the Freedom of Information Laws, or FOIL, at the state level) grant the public access to these sorts of government records, often on paper or an electronic equivalent.


Think of this in the context of a journalist who distills wider-reaching knowledge for their information consumers, their readers. Journalists are mediators. Take the case of the 2002 winner of the Pulitzer Prize for Investigative Reporting. The 14-part series on the deaths of children neglected by D.C. social services was a transformation of thousands of government records into a new form more useful and informative for The Washington Post’s readers.2 The series could not have been told without access to government records at scale. Not one record at a time, but all the records. And conversely, the value of those government records came from reporters’ skills in turning the records, and of course interviews, into something pointed, understandable, and actionable for their readers. Put another way, the knowledge that Post’s readers gained from the 14-part series could not have been FOIA’d from the government directly. The knowledge came from skilled synthesis by mediators who took a large quantity of raw data materials and produced a completely different information product for consumers.

The primary user of open government data is the mediator. I think we tend to forget that mediators have always played a central role in the dissemination of information. The iconic mediators of the 20th century were the radio and television anchors. Before that was the penny press, one-cent newspapers starting in 1830’s New York that began the modern sort of advertising-fueled and politically neutral journalism, and going earlier the advocacy journalism leading up to the U.S. Revolutionary War. Today’s mediators include traditional journalists, but also issue advocates, organizers, and app builders — not just programmers, but statisticians, designers, and entrepreneurs — who make information actionable.

Mediators need wide swaths of information that cross-cut individual events in time. The mediator analyzes the information for trends, distills the information into key points, and presents something useful to the information consumer that is very different from the source materials. FOIA, on the other hand, provides access to narrow windows into government records. FOIA is also an adversarial process: agencies prefer to find an exemption to deny a request in order to save money or save face than fulfill it, the requester may appeal, or file a law suit, and so on. FOIA is out-dated and out-moded.

The second part of the definition of Big Data is that the scale changes the way the subject of the data is understood, and that is true here. Open government data has changed the way the open government movement operates and the way individuals interface with government. It has broadened the set of professions that can participate in open government to any profession that can tell a story by transforming raw data into something new. And it has engaged more lay individuals in government transparency, and in government and civics more broadly, through the novel applications of government data that have been created by these professionals that make government more accessible and engaging.

It’s just technology

For all of these promises about the utility of open data and technology, we should remember to treat technology as mere technology. Governance is a social problem, not a technological problem. Michael Schudson, the journalism professor, wrote about perspective on the role of technology:

There is reason to be suspicious of the notion of technological revolutions. The printing press did not usher in democracy — or, if it did, it took its good-natured time! … Later, the telegraph was said to have been the center of a communications revolution. But at first the telegraph — that is, the electronic telegraph as we know it — was a relatively minor advance on the ‘optical telegraph,’ versions of which had existed for two thousand years. [I]t required the spirit of entrepreneurship at the new penny papers … to take advantage of the telegraph for news transmission. … One needs not only technologies for a revolution, but also people who can recognize their worth.3

We’re not going to see technology usher in some new form of direct government by the people. Nor would we necessarily want it. Technology doesn’t make direct democracy any more practical now than in ancient times — think about how you would feel if after your long work week your civic homework was to read a 100-page bill proposed by a stranger three states over. Not fun, and that’s exactly why we elect people to do that work for us. So as Schudson wrote, there is something democratic about technology, but it is no silver bullet. It takes persistence and creativity to put technology to work in our civic lives.

Movements are guided by principles. Our principles are that data is a public good, that value comes from transformation, that government is a platform, and that process is a legitimate policy question. Those are sorts of principles about how the world should be. There are also principles that help us understand how the world is now, and they tend to turn into buzz words: Big Data, Web 2.0, Gov 2.0, mediation, transformation, open, participation, and collaboration. Buzz words or not, these principles highlight differences we didn’t notice before so that we can better draw analogies, and from there make better decisions. Is your website’s goal to “democratize data” or to “shine light” on corruption? If it’s the former, you may learn from such and such past examples of democratizing data, but if it’s the latter you may want to follow in the paths of these other projects. And so the remaining chapters are made up of all sorts of new terminology — these are all of the principles of the open government data movement.

  1. Dana Boyd and Kate Crawford. Six Provocations for Big Data. Presented at Oxford Internet Institute’s “A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society,” September 21, 2011.

  2. Sari Horwitz, Scott Higham, and Sarah Cohen. 2001. ‘Protected’ Children Died as Government Did Little (and subsequent articles). The Washington Post.

  3. Michael Schudson. 2010. Political observatories, databases & news in the emerging ecology of public information. Dædalus.