Open Government Data: The Book

By Joshua Tauberer. Second Edition: 2014.
Also available as a Paperback and for Kindle. Tweet me at @JoshData.

14 Principles of Open Government Data

Is open government data the synthesis of “open government” and “data”, “open” and “government data”, or something else entirely?1

If “open government data” means data about “open government”, then open government data would be for accountability, innovation, participatory policy-making, access to the law, and so on. (I discussed the meaning of “open government” in History of the Movement.) This is a part of what is meant by open government data. A big tent approach to open government data is the only way to accurately describe the open government data community, and the best strategy to strengthen it.2 But this approach to understanding open government data is still missing an essential element: Open government data must also be “open.”

So is open government data instead the application of “open” to “government data”? In this view, we might include any data held by the government so long as it is available under terms that permit copying and reuse, the traditional cornerstone of “open” from the open source software and open access movements.

Harlan Yu and David Robinson suggested3 mapping the realm of government data on two dimension. The first dimension ranges from transparency (i.e. data about government) to service deliverability (including data from the government). The second dimension ranges from “inert” data such as PDFs to “adaptable” data, by which they mean machine processable data and APIs. Classic open government — e.g. the Freedom of Information Act — would fall mostly in Yu and Robinson’s transparency–inert quadrant, while Beth Noveck’s open innovation goals (e.g. Peer-to-Patent, see Consumer Products) would fall mostly in their service delivery–adaptable quadrant. All of this might be open government data.

This is a start, but I don’t think this is going to lead us to the right definition of open government data, either. I have, and I think we all should have, higher standards for government openness than openness elsewhere, such as the requirements for open source software in the private sector. The Open Knowledge Foundation’s definition of open — the Open Definition4 reproduced in Open Knowledge Definition — is simply too lax for government. And while Yu and Robinson’s “inert” data may be available, it is not open for analysis.

The purpose of this chapter is to outline the essential qualities of that higher standard: what makes government data open government data. For a short version, open government data is four “A”s: accessible5, accurate, analyzable, and authentic.6

There have been many attempts to define these essential qualities, and I have consolidated those attempts into 14 distinct principles. In the next several sections, these qualities of open government data are laid out in detail. A definition of data quality is provided to explore further the meaning of machine processability. I also discuss a maturity model which can guide the development of open government data programs.

The recommendations in this section address how to make public government data open, starting not with what should be open but what it means for data to be open and how to do it well. Government records that are not public, by law, and records rightly restricted on account of privacy, security, and copyright, are simply out of the scope of these recommendations.

  1. For this I hat-tip Justin Grimes.

  2. Jeremy Weinstein and Joshua Goldstein. 2012. The Benefits of a Big Tent: Opening Up Government in Developing Countries. UCLA Law Review Discourse.

  3. Harlan Yu and David G. Robinson. February 28, 2012. The New Ambiguity of “Open Government.”

  4. opendefinition.org

  5. For more on access, see Greg Michener and Katherine Bersch. 2011. Conceptualizing the Quality of Transparency, presented at the 1st Global Conference on Transparency Research, Rutgers University-Newark, May 2011.

  6. The three “A”s of accessible, authentic, and accurate were suggested by Reynold Schweickhardt, the House Committee on House Administration’s director of technology policy at the committee’s 2012 conference on legislative data and transparency.