5. Principles of Open Government Data
The House Committee on House Administration held a conference in early 2012 on legislative data and transparency. Reynold Schweickhardt, the committee’s director of technology policy, made an interesting observation at the start of the day that policy for public information often is framed in terms of 3 A’s:
accessibility,127. For more on access, see Michener, Greg and Katherine Bersch. 2011. Conceptualizing the Quality of Transparency, presented at the 1st Global Conference on Transparency Research, Rutgers University-Newark, May 2011.
They are good principles. And yet us data geeks and entrepreneurs so often find ourselves having to start from scratch explaining why clean data is so important. It seems contradictory: if accuracy is a concept practitioners in government get, and if ‘clean’ is a type of accuracy, then there must be some communications failure here if we’re having a hard time explaining open data to government agencies. What other word do we need to add to those 3 A’s to work open data in there? Some possibilities are precision, analyzable and reusable128. Association of Computing Machinery’s Recommendation on Open Government, February 2009. http://www.acm.org/public-policy/open-government, automatable, adaptable (see below), or normalized and queryable129. suggested by Javier Muniz. Being precise about what “open government data” means helps us formulate our asks when we approach governments and gives insight into what this new field is all about.
An open government working group convened by Carl Malamud in November 2007 was the first to attempt this. Its 8 Principles of Open Government Data130. which I helped write, included in full in section 7.1 and online at opengovdata.org, specified a working definition for what it means for public government data to be open. The Open Knowledge Foundation wrote an Open Knowledge Definition (OKD) at opendefinition.org (and reproduced in section 7.2) in 2006, which adapted a definition of open source software for sharing data.
Open government data might simply be the application of “open,” as in the sense of the OKD, to data held by the government. I find this too weak to be a definition of open government data. For instance, the OKD allows governments to require attribution on reuses of its data, which I believe makes government data not open (more on that later). Or, open government data might be the synthesis of “open government” and “data,” in which case it refers to data that is relevant to government transparency, innovation, and public-private collaboration. But perhaps the open government data movement cannot be decomposed according to its words. Justin Grimes has pointed out to me that, looking at its history, the movement has come out of three very distinct communities: classic open government advocates whose focus has typically been on freedom of information and money in politics, open source software and open scholarly data advocates, and open innovation entrepreneurs (who might include both Gov 2.0 entrepreneurs and government staff looking to the public for expertise, such as in Peer to Patent). To each group, “open” means something different.
Three communities using the same word for three different purposes inevitably lead to confusion. Yu and Robinson (2012) described the consequences:
The shift has real-world consequences, for good and for ill: Policies that encourage open government now promote a broader range of good developments, while policies that require open government have become more permissive. A government’s commitment to be more “open” can now be fulfilled in a wider variety of ways, which makes such a promise less concrete than it used to be. . . . A government could commit to an open data program for economic reasons—creating, say, a new online clearinghouse for public contracting opportunities.131. Yu, Harlan and David G. Robinson. February 28, 2012. The New Ambiguity of “Open Government.”
Beth Noveck, the professor and former U.S. deputy chief technology officer for open government, wrote in a blog post in 2011 of the trouble the ambiguity created for the goals that she had brought to the White House:
In retrospect, ‘open government’ was a bad choice. It has generated too much confusion. Many people, even in the White House, still assume that open government means transparency about government. . . . The aim of open government is to take advantage of the know-how and entrepreneurial spirit of those outside government institutions to work together with those inside government to solve problems.132. Noveck, Beth Simone. April 7, 2011. What’s in a Name? Open Gov and Good Gov. The Huffington Post.
Yu and Robinson suggested breaking down open government data not into three parts but four, using two dimension. The first dimension ranges from transparency (i.e. data about government) to service deliverability (including data from the government). The second dimension ranges from “inert” data such as PDFs to “adaptable” data, by which they mean precise, machine processable data and APIs. Grimes’s “classic open government” would fall mostly in Yu and Robinson’s transparency–inert quadrant, open innovation would fall mostly in their service delivery–adaptable quadrant.
The confusion is not likely to be resolved by choosing one definition or the other, but instead by practitioners being more clear about their personal goals. My goal, and the theme of this book, is to treat open government data as more than just the sum of its parts: it is “Big Data” applied to Open Government. That means a definition must draw from not only open data (i.e. the OKD) and open government (transparency, innovation, and collaboration) but also from the qualities of Big Data. In the definition of Big Data that I adopted in Chapter 1, Big Data has two parts: 1) it is data at scale, and 2) it allows us to think about the subject of the data in a new way. Big Data data is data that is amenable to automated analysis and transformation into novel applications. If we are to add another A-word, it would be “analyzable.”
To summarize the rest of this chapter, open government data has the following defining qualities:
“Open” or “Accessible”:
Data must be online and available for free, in bulk, with no discrimination, and without the need to agree to a license that waives any rights the user might otherwise have.
“Big Data” or “Analyzable”:
The complexity of today’s governments necessitates the use of automation in any serious application or analysis of government data, such as to search, sort, or transform the data. Data must be machine-processable following the general guiding principle of making choices that promote analysis and reuse.
Properly implemented open government data also has these desired qualities:
“Open” or “Accessible”:
Data should use non-proprietary file formats appropriate for the intended use of the data, be documented, be posted permanently, and use safe file formats.
“Accurate” and other aspects of data quality:
Governments should provide the lowest-level granular data and should make data interoperable through coordination. Data should also maximize accuracy and precision at a reasonable cost to the data user.
“Authentic” and questions of process:
This category of principles addresses how a data release should address human needs such as relevance and trust. The principles include timeliness, digital provenance, the use of public input, the need for public review, the dangers of endorsements, and general priorities for government agencies.
In the next several sections, the defining and desired qualities of open government data are laid out in detail. This chapter wraps up with a definition of data quality and case studies of the principles applied in practice.