3. Principles 6–8: Universality of Use

The remaining principles from the 8 Principles of Open Government Data are as follows. Data should be:

6. “Non-discriminatory: Data are available to anyone, with no requirement of registration.” Anonymous access to the data must be allowed for public data. This principle is also related to the OKD’s "no discrimination" requirements.

7. “Non-proprietary: Data are available in a format over which no entity has exclusive control.” Proprietary formats add unnecessary restrictions over who can use the data, how it can be used and shared, and whether the data will be usable in the future.

While there is nothing wrong in principle with exclusive control over a data format, proprietary formats are troublesome for open data because data is not open if it is not open to all. A document released for the word processing application Pages will only be open to individuals who can afford to own a Macintosh computer. It can only be used in ways supported by the Pages program. And looking ahead, since government documents should be archivable, it will only be able to be opened so long as the company Apple still exists and continues to support its Pages application. Proprietary formats create practical restrictions that open, non-proprietary formats do not. Non-proprietary formats are often supported by a wider range of applications, and therefore support a wider range of uses and users.

Writing in 2012, Kevin Webb of OpenPlans encountered a problem using geospacial (GIS) data considered nominally open from the U.S. Geological Survey. He wrote:

Several weeks back I needed to make a map for a big chunk [of] the Pacific Northwest. I leveraged all kinds of useful open data (OSM for streets, Lidar from local governments, etc.) but above all else I needed really good stream and river data. Lucky for me the USGS maintains a detailed data set that maps every stream and pond in the entire U.S., even the tiny intermittent ones!

I’ve been working with GIS tools and data in a professional capacity for going on fifteen years and I consider myself pretty savvy. However, over the last decade all of my work has come to depend on open source GIS tools—my ArcGIS license and the parallel port dongle it required stayed behind when I left university. So while I can tell you all about spatial indexes and encoding formats for transmitting geometric primitives, I missed the memo on ESRI’s new File Geodatabase format; the format now being used to manage and disseminate data at the USGS.[142]

The new Geodatabase format has become the standard data format for GIS information, replacing the open Shapefile format, Webb wrote. Unfortunately, the only software capable of opening Geodatabase files is the software produced by the company who created the format, ESRI, which sells its software for $1,500. There is nothing wrong with ESRI keeping its formats proprietary to induce potential buyers to pick up its software. But the USGS’s choice to use a proprietary format reduced the value of the data to the public substantially.

Use of proprietary formats may also constitute a form of endorsement that may create a conflict of interest. While some proprietary formats are nearly ubiquitous, it is nevertheless not acceptable to use only proprietary formats, especially closed proprietary formats. On the other hand, the relevant non-proprietary formats may not reach a wide audience. In these cases, it may be necessary to make the data available in multiple formats.

Commonly used proprietary formats are Microsoft Office documents through version 6, the audio format MP3, and the video format WMA. These data formats should be avoided. Although the PDF format was originally proprietary, it has since been taken over by a standards body making it an open, non-proprietary format, although it may not satisfy the machine processable principle. The current Microsoft Office formats are open and nominally non-proprietary.

CSV, OpenOffice document, XHTML, most XML, and Ogg are all non-proprietary formats.

8. “License-free.” Dissemination of the data is not limited by intellectual property law such as copyright, patents, or trademarks, contractual terms, or other arbitrary restrictions. While privacy, security, and other concerns as governed by existing law may reasonably limit the dissemination of some government data, that data simply does not meet the standards of openness. Only data not subject to a license is open. Every effort should be made to make non-restricted portions of otherwise restricted documents available under these principles. This principle is a stronger version of the OKD’s “redistribution” and “reuse” requirements.

Just as with what constitutes appropriate fees, appropriate license terms vary from culture to culture. This principle, too, may be biased toward U.S. culture. In the United States, the ideal of “free speech” places a considerable restriction on the government to not use the law to prevent the dissemination of information, especially information related to the government. For instance federal government-produced documents are generally excluded from copyright protection.[143]

Still, the principle is rarely executed correctly. Data.gov, which is a catalog of government datasets, imposes a terms-of-use agreement on all its data sets. It reads, “By accessing the data catalogs, you agree to the Data Policy,”[144] and the Data Policy requires users of the data to include a disclaimer in their applications: “Finally, users must clearly state that ‘Data.gov and the Federal Government cannot vouch for the data or analyses derived from these data after the data have been retrieved from Data.gov.’ ”[145] This is the only requirement the Data Policy places on data users. (It is buried within eight other paragraphs setting out expectations for the agencies submitting the data, but of course those other paragraphs are eviscerated of any legal force by a disclaimer in the final paragraph.) A disclaimer is relatively innocuous, and yet putting words into the mouths of those disseminating government information is a free speech issue. In the Citizens United case before the Supreme Court, the Court noted that “[d]isclaimer and disclosure requirements may burden the ability to speak” (though it upheld the electioneering disclaimer requirements in question on the grounds that it keeps voters informed).[146] In any case, the disclaimer requirement is enough to violate the license-free principle of open data.

The EU PSI Directive notes that licenses covering government data may consider “liability, the proper use of documents, guaranteeing non-alteration and the acknowledgment of source.” Any of these provisions would violate the license-free principle stated here.

In many European countries and at the state-level in the United States, the government holds a copyright over works it produces, though commonly with exceptions for the law itself.[147] In jurisdictions that impose a government copyright (such as crown copyright), open government data should be explicitly dedicated to the public domain. The Creative Commons CC0 is a universal legal instrument that is appropriate to waive intellectual property rights on government works.[148] In these cases, the “license-free” principle is perhaps misworded, since a license may be needed to un-do the restrictions imposed by copyright law.

The license covering Data.gov.uk’s catalogue, the U.K. equivalent of our Data.gov, may be used as model language for granting permissive use of government data. The license grants the right to:

  • “copy, publish, distribute and transmit the Information;

  • adapt the Information;

  • exploit the Information commercially for example, by combining it with other Information, or by including it in your own product or application.”

But the license also requires attribution, a link back to the license, and, well, truthfulness: the license requires that the user not “suggest[...] any official status” or “mislead others or misrepresent the Information.”[149] (The trouble with a legal requirement of truthfulness is that truth is often subject to interpretation. Governments have no business regulating truth for truth’s sake. Other regulations of truth, such as in commerce and defamation, involve some actual harm.) So it comes short of waiving all intellectual property protections.

The New Zealand Government Open Access and Licensing Framework, approved in August 2010, recommends a different Creative Commons license, one that requires the data user to attribute the data back to the government:

State Services agencies should make their copyright works which are or may be of interest or use to people available for re-use on the most open of licensing terms available within NZGOAL (the Open Licensing Principle). To the greatest extent practicable, such works should be made available online. The most open of licensing terms available within NZGOAL is the Creative Commons Attribution (BY) licence.[150]

A cooperation of the federal and local governments in Austria in 2011 endorsed this Creative Commons Attribution License for government data[151], which requires attribution (and nothing else) for reuse of data. On a strict reading of the license-free principle, any such restrictions would make the data not open. Pragmatically, the fewer restrictions the better.

