Open Government Data: The Book

By Joshua Tauberer. Second Edition: 2014.
Also available as a Paperback and for Kindle. Tweet me at @JoshData.

No Discrimination and License-Free (Principles 6 and 8)

The remaining two principles1 from the 8 Principles of Open Government Data state that the government may not restrict use of the data capriciously. The first of the two principes addresses this on the substance:

(6) “Non-discriminatory: Data are available to anyone, with no requirement of registration.”

This principle is also related to the Open Definition’s “no discrimination” requirements. The UK Open Data Whitepaper (2012) states the principle more clearly:

(10) Public data will be freely available to use in any lawful way. . . . Applications are able to use the data in any lawful way without having to inform or obtain the permission of the public body concerned. . . . (11) Public data will be available without application or registration, and without requiring details of the user.

Anonymous access to the data must be allowed for public data. A requirement of registration puts data users — often government watchdogs — at risk for retaliation by the government. The Washington, DC data catalog unfortunately requires data users to “[n]otify the District of Columbia via email” about uses of DC government data.2

Access terms that violate the principle

Discriminatory practices around data are common. One dataset posted by the U.S. Substance Abuse & Mental Health Services Administration (SAMHSA) requires users to agree first:

To use these datasets solely for research or statistical purposes and not for re-identification of specific RESEARCH SUBJECTS. . . . . . . . If SAMHSA or ICPSR determines that this terms of use agreement has been violated, then possible sanctions could include . . .3

While the re-identification of research subjects may be unethical, and probably should be illegal, by narrowing permitted use to “research or statistical purposes” the agency is imposing a restriction on use consisderably narrower than any lawful purpose.

U.S. agencies are also beginning to open a door to discrimination in API terms of service agreements — contractual agreements one must agree to before accessing a live data service. For instance, to use the U.S. Energy Information Administration’s beta API a data user must agree that

The EIA reserves the right (though not the obligation) to: (1) refuse to provide the API to you, if it is the EIA’s opinion that use violates any EIA policy; or, (2) terminate or deny you access to and use of all or part of the API at any time for any other reason in its sole discretion.4

This API is potentially discriminatory if the EIA can deny access “for any other reason in its sole discretion.” In this case the same data is also available as a bulk data download for which these terms do not apply, but that wasn’t always the case. When data is only available through terms such as these it is not open if data users’ right to access it are dependent on an agency’s internal policies and discretions.

Similarly, the Seattle, Washington data catalog’s terms allow the city to shut down any use of its data for any reason! Here is the relevant part of its terms of use agreement:

reserves the right to . . . require the termination of any and all displaying, distributing or otherwise using any or all of the data for any reason5

This is not open data!

The second of the two principles addresses a particular way that governments can exert control over the use of public data:

(8) “License-free.” Dissemination of the data is not limited by intellectual property law such as copyright, patents, or trademarks, contractual terms, or other arbitrary restrictions.

While privacy, security, and other concerns as governed by existing law may reasonably — and rightly — limit the dissemination of some government data, that data is not open government data. Only data not subject to a license is open. This principle is a stronger version of the Open Definition’s “redistribution” and “reuse” requirements.

Sunlight Foundation’s Open Data Policy Guidelines (2014) says to “(11) Mandate Data Be Explicitly License-Free.”

This principle is also reinforced in a joint statement I wrote with Eric Mill (Sunlight Foundation), Jonathan Gray (Open Knowledge Foundation), Parker Higgins (Electronic Frontier Foundation), Michael Weinberg (Public Knowledge), and Timothy Vollmer (Creative Commons), and signed onto by a number of organizations: “Best-Practices Language for Making Data ‘License-Free.’ ”

A license is a contract that a data user agrees to in exchange for either a) access, or b) the right to make copies. When a work is copyrighted, a license is required to undo or partially undo the all-rights-reserved default state. This is usually the case for works created in the private sector, and it is how open source software works. Without a license, disseminating a work would be copyright infringement.

But in the United States, where most federal government data is not subject to copyright6, there is no need for a license to make data open — this data is “born” into the public domain. Any use of a license for public domain data violates the principle.

In many European countries and at the state-level in the United States, the government holds a copyright over works it produces7, though commonly with exceptions for the law itself.8 In jurisdictions in which government data is “born” copyrighted, a license is needed to make government data open (contrary to the phrasing of the principle). But in these cases the license should put the data in the public domain and not restrict access or use.

Licenses that violate the principle

Still, even in the United States, government data is remarkably restricted. Many datasets posted by U.S. government agencies, especially when posted as an API, are locked behind terms of use agreements (which is essentially a license by another name). Data.gov, which is a catalog of U.S. government datasets, imposes a terms-of-use agreement on all its data sets. It read, “By accessing the data catalogs, you agree to the Data Policy,”9 and the Data Policy required users of the data to include a disclaimer in their applications: “Finally, users must clearly state that ‘Data.gov and the Federal Government cannot vouch for the data or analyses derived from these data after the data have been retrieved from Data.gov.’ ”10

The SAMHSA agreement mentioned above also states:

You agree to reference the recommended bibliographic citation in any of your publications that use SAMHSA data.

And a dataset from the Centers for Medicare & Medicaid Services (CMS) requires data users to agree to:

The user may not present data that has been altered in any way as CMS data.11

Attribution, non-modification, and waiving the right to sue the government are extremely common requirements.

The 2003 EU PSI Directive said that licenses covering government data in the European Union may consider “liability, the proper use of documents, guaranteeing non-alteration and the acknowledgment of source.” The UK Open Data Whitepaper (2012) requires government data to use an “open license which enables re-use, including commercial re-use,” but the UK Open Government License12 requires data users to “acknowledge the source of the Information [and] provide a link to this licence.” In the first version of the license users were required to not “mislead others”13 (but this requirement was removed from the license in 201314).

The New Zealand Government Open Access and Licensing Framework, approved in August 2010, recommends a Creative Commons license for government works that requires the data user to attribute the data back to the government:

State Services agencies should make their copyright works which are or may be of interest or use to people available for re-use on the most open of licensing terms available within NZGOAL (the Open Licensing Principle). To the greatest extent practicable, such works should be made available online. The most open of licensing terms available within NZGOAL is the Creative Commons Attribution (BY) licence.15

A cooperation of the federal and local governments in Austria in 2011 endorsed the Creative Commons Attribution License for government data as well.16

Rationale

Requirements for attribution and data integrity, though innocuous sounding at first, create a lever — a civil penalty arising out of violation of a contract — by which the government can control speech. Data users agree, in virtue of reading the terms and downloading the data, to agree to the terms. When applied to medical research data, as in the SAMHSA data, this lever may seem reasonable.

But imagine these requirements on government spending data, on agency decisionmaking records, or government conflict-of-interest disclosure data. Would newspapers be subject to litigation for failing to attribute the government properly if the government didn’t like how the data was portrayed? Or would the public be liable for making corrections to errors in government data (altering it) before sharing it with others? What is misleading is often subject to interpretation and this sort of requirement creates a grey area for government litigation.

Here in the U.S. we have unusually, if not uniquely, strong norms about the government not interfering with public knowledge. Propaganda is illegal. Freedom of the press is incredibly strong. Requiring attribution to the government, which might sound reasonable elsewhere, would be a major policy shift with significant legal implications for the press here. “No restrictions on use” is our baseline, and transparency is impeded if accessing the information comes with restrictions on speech. The Supreme Court noted in Citizens United that “[d]isclaimer and disclosure requirements may burden the ability to speak.”17

Making data license-free

In the joint statement “Best-Practices Language for Making Data ‘License-Free.’ ”18, my co-authors and I recommended that governments adopt specific language to put works into the world-wide public domain. For works created by federal government contractors, for instance, we recommended the following language be applied to the data:

This work was created through a government contract which assigned copyright to [the United States Government or Agency name]. [Agency Name] waives copyright and related rights in the work worldwide through the CC0 1.0 Universal Public Domain Dedication (which can be found at http://creativecommons.org/publicdomain/zero/1.0/).

The Creative Commons CC0 is a universal legal instrument that can be used to waive world-wide intellectual property rights in a work. Unlike the Creative Commons Attribution license, called CC-BY, the CC0 is not a license but rather a waiver of copyright and related rights. In jurisdictions in which a waiver is not possible, the CC0 acts as a license that grants unlimited rights in perpetuity.

Our recommendations also cover federal government works, primary legal materials, and works that mix government and non-government authors. Our suggested language has been used by the Department of Health and Human Services’s ckanext-datajson project19, the Consumer Financial Protection Bureau’s qu project20, and a White House report21. OpenFDA’s API terms of use agreement at https://open.fda.gov/terms/ is a rare example of an agreement that does not capriciously add restrictions on use.

  1. Principle 7 was noted in Analyzable Data in Open Formats (Principles 5 and 7).

  2. http://data.dc.gov/TermsOfUse.aspx, accessed August 2014. Phil Ashlock says this has been unchanged since 2010 and pointed this case out to me.

  3. http://www.icpsr.umich.edu/cgi-bin/terms?path=SAMHDA&study=34898&bundle=ascii&ds=1&dups=yes, for a dataset named “1992 through 2010 Treatment Episode Data Set - Admissions (TEDS-A)”, accessed May 2, 2014

  4. http://www.eia.gov/beta/api/tos.cfm, accessed May 2, 2014.

  5. https://data.seattle.gov/data-policy, accessed August 2014. Phil Ashlock says this has been unchanged since 2010 and pointed this case out to me.

  6. The National Institute for Standards and Technology in the Department of Commerce is exempt from the no-government-copyright provision, as are often works that are produced by government contractors.

  7. Data, in general, is not copyrightable as such, and neither are facts, though particular compilations of data or facts may be.

  8. C.J. Angelopoulos writing to the OKFN’s open-government mail list on Feb. 7, 2011.

  9. http://explore.data.gov/catalog/raw/, accessed July 7, 2011

  10. http://www.data.gov/datapolicy, accessed July 7, 2011.

  11. The “CMS Data Disclaimer - User Agreement” linked from the dataset named “Part B National Summary Data File - CY2001”, accessed May 2, 2014.

  12. http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/

  13. http://www.nationalarchives.gov.uk/doc/open-government-licence/version/1/

  14. http://blog.okfn.org/2013/07/01/uk-open-government-license-is-now-compliant-with-the-open-definition/

  15. http://www.ict.govt.nz/programme/opening-government-data-and-information/nzgoal/read-nzgoal

  16. http://blog.okfn.org/2011/08/15/austria-adopts-ckan-and-cc-by-as-nation-wide-defaults/

  17. http://press.take88.com/wp-content/uploads/2010/03/08-205.pdf, page 6. Unfortunately the Court dismissed the burden in their reasoning that electioneering disclosures would be reasonable.

  18. http://theunitedstates.io/licensing/

  19. Because I was the contractor for HHS who built that project! https://github.com/HHS/ckanext-datajson

  20. https://github.com/cfpb/qu

  21. http://www.whitehouse.gov/sites/default/files/microsites/ostp/us_open_data_action_plan.pdf