6. Other Practical Guidelines for Web Pages and Databases
The recommendations in this section address more narrow concerns about websites and databases and should be addressed only after the preceding principles are applied.
Google has made several recommendations from the point of view of web search.157. Marsh, Jennifer. Our recommendations for increasing citizens’ access to government information. Google Public Policy Blog. June 22, 2009. http://googlepublicpolicy.blogspot.com/2009/06/our-recommendations-for-increasing.html The ability for the public to find government information is a crucial part of government information being open. Their first recommendation is to use their Sitemaps protocol which helps search engines crawl websites more deeply and efficiently. Their second recommendation was to review whether search engines are blocked from parts of an agency’s website by a robots.txt file, which describes the agency’s policy regarding automated access to their website. A robots.txt file should be used sparingly so as not to limit the public’s ability to gather data from the agency or gather data about the agency. As noted by Webcontent.gov, restricting access with a robots.txt file may be contrary to an Office of Management and Budget memorandum in the United States.158. http://www.usa.gov/webcontent/technology/search/robotstxt.shtml
Permanent web addresses (discussed earlier) are a part of a larger picture of using globally unique identifiers (GUIDs). This concept is that any document, resource, data record, or entity mentioned in a database, or some might say every paragraph in a document, should have a unique identification that others can use to point to or cite it elsewhere. A web address is a globally unique identifier. Any web address refers to that document and nothing else, and this reliability promotes the dissemination of the document as it provides a means to refer to and direct people to it. GUIDs that persist across database versions allow users of the database to process the changes more easily. If two datasets use a common set of GUIDs to refer to entities, such as campaign donors, then the value of the two datasets becomes more than just the sum of their parts. The connections between the databases adds great value to how they can be used. An easy (and accepted) way to choose GUIDs is to piggy-back off of your agency’s web domain, which provides a space of IDs for you to choose from that won’t clash with anyone else’s IDs. For instance, you may coin verbose GUIDs for entities such as "http://www.youragency.gov/guids/john_smith", rather than a simple, opaque, and non-globally-unique numeric ID "12345". Such GUIDs are a form of URI (uniform resource identifier), but the important part is that they are simply a unique identifier.
The use of GUIDs in the form of URIs is a part of a technological movement called Linked Open Data (LOD, see linkeddata.org). Promoted by the creator of the Word Wide Web, Tim Berners-Lee,159. Berners-Lee, Tim. June 24, 2009. Putting Government Data Online. the LOD method for publishing databases achieves data openness in a standard format and the potential for interconnectivity with other databases without the expense of wide agreement on unified inter-agency or global data standards. LOD is a practical implementation of Semantic Web ideas, and several tools exist to expose legacy databases and spreadsheets in the LOD method. Though I have been writing about the uses of the Semantic Web for government data160. Tauberer, Joshua. August 2009. Building a Civic Semantic Web. Nodalities. for as long as I’ve been publishing legislative data, it has not caught on in the United States, though it has become a core part of Data.gov.uk and is a recommendation of the Australian Governments Open Access and Licensing Framework161. http://www.ausgoal.gov.au/ausgoal-qualities-of-open-data, accessed July 10, 2011..
The W3C working draft Publishing Open Government Data162. Bennett, Daniel and Adam Harvey. September 8, 2009. Publishing Open Government Data (W3C Working Draft). and the Linked Data Cookbook published by the W3C Government Linked Data committee163. Hyland, Bernadette, Boris Villazón Terrazas and Sarven Capadisli. 2011. Linked Data Cookbook. provide additional best practices with regard to GUIDs and Linked Open Data.