2. Democratizing Legal Information

Not everything is about corruption. Many open government projects, including one of my own, are about creating access to primary legal documents. Two factors distinguish projects that democratize legal information from the sort discussed previously that aim to be a “disinfectant.” First, the projects in this section do not presume that anything in particular is wrong with government. No judgement is made. Second, the value of these projects to their users is more direct than the value of the projects discussed in the previous section — an idea I’ll return to later.

The legal materials that have received focus by U.S. projects include congressional bills (e.g. by my own GovTrack.us and WashingtonWatch.com by Jim Harper), state legislation (Richmond Sunlight by Waldo Jaquith, Knowledge As Power by Sarah Schacht, and the Open States Project of the Sunlight Foundation), administrative law (Federal Register 2.0), statutory law (Cornell University’s Legal Information Institute and Virginia Decoded by Waldo Jaquith), and case law (RECAP out of Princeton University, all among many others). What these projects all have in common is digging deeply into a particular aspect of law, generally making the text available in a way it was not before, and often providing additional tools to track changes to the law.

GovTrack.us, my first open government project (see Chapter 1), was the first non-subscription website that presented a unified account of what Congress was doing along with tools to track future legislative activity. The site includes voting records, biographical information on Members of Congress, the status and complete text of legislation, and other information collected from congressional sources. There is no official downloadable database of most of this information. There are websites that display all of the information, but few databases that programmers like myself can transform into new applications. In putting all of this information together, new possibilities emerge. Novel statistics about the performance of Members of Congress become possible once the data is available to run the number crunching.

For instance, GovTrack computes leadership and ideology scores for Members of Congress based on their patterns of sponsorship of bills — shown in Figure 14. Charts like these help visitors to put the information they see in context. In GovTrack’s leadership-ideology charts, leadership is shown on the vertical axis (higher means more a leader) and ideology is shown on the horizontal axis. In the figure, Sen. Harry Reid, the senate majority leader, is marked with a triangle. If you did not know he is the majority leader, his position at the top of the chart would give you an idea. The figure also highlights Sen. Susan Collins. Well known as a moderate Republican, Collins’s ideology score reflects just how moderate she is. She is more liberal than some Democrats.

Figure 14. GovTrack computes leadership and ideology scores for Members of Congress based on their patterns of sponsorship of bills.

The statistical analysis doesn’t look at the content of bills or the party affiliation or anything else about the Members of Congress it is analyzing, but it is able to infer underlying behavioral patterns some of which correspond to real-world concepts like left-right ideology. To compute the ideology scores, I form a matrix with columns representing the senators and rows also representing the senators. Then I put a 1 in each cell where the senator for the column cosponsored any bill by the senator for the row, and I put zeros everywhere else. Then I use a statistics package to perform a principle components analysis on the matrix, in this case a singular value decomposition, and what comes back happens to be ideology scores.[77]

The leadership scores are based on Google’s PageRank algorithm. Google’s algorithm for ranking pages is widely known: the more links you get the higher ranked your page, but links you get from highly ranked pages are even better. In Congress we can look at the network of who is cosponsoring whose bills similarly. When a representative cosponsors a bill, it is a vote of confidence not only for that bill but also a vote of confidence or loyalty for the bill’s sponsor. If we imagine Members of Congress each as a “web page” and each time a Member cosponsors another Member’s bill it is a link from one “web page” to that of the other, then the PageRank algorithm is going to reveal the ranking of the implicit loyalties directly from the public, official behavior of the Members of Congress. And it does.[78]

Statistics are one way to give context. Another is to use personal geography. By overlaying Census geographical data with Google Maps, GovTrack made it possible to reliably determine your congressional district by zooming to street level. That is crucial if you live either near a district boundary or in a metropolitan area. Address databases only have about a 95% accuracy, but maps are almost always right. See Figure 15. Another form of context is to show changes over time. The deletion of eight lines from a House appropriations bill is highlighted with automatic red-lining using a tool for bill text comparison that I developed for POPVOX — see Figure 16.

Figure 15. GovTrack combines GIS data from the Census with Google Maps to create zoomable congressional district maps.

Figure 16. GovTrack and POPVOX.com show changes to bill text made during the committee and amendment processes. Here the deletion of eight lines in a House appropriations bill is highlighted, based on the bill text comparison algorithm I developed for POPVOX. http://www.govtrack.us/congress/bills/112/hr2112/text

The best part of GovTrack is that the site runs itself. I’ve programmed the site to periodically go out to government websites and fetch the information they have on Congress. It scans for new bill status, votes, and other information in a completely automated way. This process is called screen scraping: programmatically loading up web pages, looking at their HTML source, and extracting information using simple pattern matching. It’s not interesting programming work, and screen scrapers are easily confused because of the multitude of ways in which unstructured information can be displayed. For instance, several years after finishing the bill status screen scraper I learned — because my scraper was crashing — that a bill can actually be sponsored by not just a person but by a committee itself, or can even not have a sponsor (which has been the case with debt-limit-raising bills because no one wants to take responsibility for it). I hadn’t anticipated these cases, and unanticipated cases cause problems, sometimes leading to incorrect information being shown on the site.

GovTrack reaches about half a million people each month directly, and well over a million if you count visitors to websites and mobile apps built by others on top of GovTrack’s legislative database. When I opened up the source data that powers GovTrack, a collection of mostly XML files, others started to see the potential for building other tools that shed light on government processes in new ways. The three biggest reusers of the data are OpenCongress.org by the Participatory Politics Foundation (PPF), MAPLight.org, which puts a new spin on the connections between money and politics (see section 6.2 for more thoughts on MAPLight), and the mobile apps and APIs created by the Sunlight Foundation. (Both OpenCongress and MAPLight have been funded by the Sunlight Foundation.) Another interesting use of GovTrack’s data is IBM ManyBills, which is a visualization tool named after their ManyEyes project but for congressional legislation. At least two dozen websites have popped up relying on GovTrack data all trying to give the public a new way to get a grasp of their government — I’m sure there are many I’m not even aware of.

Now, truth be told, I started working on GovTrack.us in the early 2000’s because I thought the sort of transparency GovTrack would create could empower voters to make better decisions. That’s typical disinfectant-speech. But ten years later, never have I heard of a case of information found on GovTrack — whether it be a voting record or the text of a bill — changing anyone’s mind about who they would vote for in an election. At the time I began building the site I hadn’t yet even voted in an election myself, and it was grossly naive to think that that could have been the case. The reason is simple. At least one person in any election is not the incumbent, and if the challenger did not serve in Congress then GovTrack has nothing to say about whether you should prefer that candidate or not. The incumbent’s legislative record doesn’t actually help much in that decision.

Today I view the goal as something more basic and along the lines of civic education. Through greater understanding, I hope to reduce cynicism and mistrust in the cases where it’s not really called for. Sometimes it’s called for. But not always. For instance, many bill titles end with “and for other purposes.” I have been asked many times how one could support a bill that is so vague that it does not even say what it will do, and are congressmen trying to pull one over on us by granting themselves indefinite authority for “other purposes”? The reality is that bills often address too many issues to include them all in a succinct title. So in the end, it is just a title. The full text of the bill, which everyone can read, always spells out the details and often in the most rigorous lawyer-speak. Without this understanding of how Congress works, it is easy to be cynical. But this cynicism does no good.

*   *   *

Carl Malamud has been leading an effort to fill in the gaps where primary legal materials are not (freely) available to the public at all. Some of the gaps are state codes. Much of the gap are judicial decisions and related court documents which make their way behind pay-walls run by private companies (Westlaw, LexisNexis), associations (the American Bar Association), and the courts themselves. Malamud, I don’t think, would fault private companies for selling value they add to public documents, but he does criticize the courts and academia for not living up to a higher standard:

Our law schools and our law libraries are not active in maintaining the corpus of primary legal materials. We’ve outsourced this important function, and as a consequence, America is not being well served . . . Today, law libraries risk becoming a 7-11, where one vendor comes in and fills up the donut case, another stocks the ATM, and your job is all about managing vendors and answering an occasional query from a customer.[79]

Malamud’s project, under the moniker Law.Gov (but the website is law.resource.org), points to many practical implications of broad access to the law: improved civic education in schools, deeper research in universities, innovation in the legal information market, savings to the government, reducing the cost for small business of maintaining legal compliance, and greater access to justice. Free public access to legal materials isn’t intended to necessarily replace the expensive subscription services for legal professionals, but instead to open up the legal materials to a new audience.

All of the benefits to the public in the last paragraph of access to the law are what I meant by the value of these projects being more direct. A website that aims to reform government has indirect value to the public. First the public has to use the information to elect better policymakers, then the policymakers hopefully make better policy, and decades later the public benefits from the new policy. In the case of Law.Gov, the benefit is direct and immediate. Reduced costs for small business is reduced costs now.

It was very disconcerting the first time I came to grips with the fact that the law is so hard to find. There are both theoretical and practical reasons for this. On the theoretical side, federal statutory law works in such a way that for most of the law there is no actual document produced which you could say is actually the definitive law. The law comes about piecemeal through actions of government. The law is the culmination of those actions, regardless of whether the culmination itself is written anywhere. For instance, let’s say a bill called the Puppies Are Cute Act reads, “Puppies are cute.” The bill is enacted. Then a second bill amends the law by reading, “Strike the first word of the previous law and insert in its place ‘Cats.’ ” Nowhere is the current law “Cats are cute” actually written, but that is the law. In a sense, statutory law is the hypothetical document that would result if you tried to put all of the enacted bills together.

When bills are enacted they are printed one after another into the Statutes at Large. The Statutes at Large define federal statutory law, but in order to know the current law taking into account additions, revisions, and repeals one would have to read it from the start (starting in 1789) and assemble one’s own account of the text of current law. Occasionally the U.S. House of Representative’s Office of the Law Revision Counsel will do this, to create the United States Code. However, the Law Revision Counsel has no authority to change the law. Thus, the U.S. Code is (in general) not the actual law. (It is called “prima facie evidence” of the law.) If the Law Revision Counsel made a mistake, that mistake would not be a part of the actual law — you are responsible for knowing the actual law, not the Law Revision Counsel’s best-guess at the law, again, even if that law is not written anywhere. (Congress occasionally uses a slight of hand to get everyone on the same page about what the law actually is. On these occasions Congress passes a bill that repeals various past laws and enacts, essentially in their place, parts of the U.S. Code deemed current and reliable enough to turn into law. These sections that have been re-enacted into law are called positive titles of the U.S. Code.) A similar situation exists for administrative law, which is the law created by executive-branch agencies through power delegated by the legislative branch. U.S. administrative law is created through publication of rules changes in the Federal Register. The compilation of those rules forms the Code of Federal Regulations.

Though these documents don’t capture the complete law, on a practical level they are at least accessible to the public at large. Some of these documents have been posted online (and free) since the mid-1990’s. But online doesn’t always mean it is useful. The most useful place to read the U.S. Code has been on the website of the Cornell University Legal Information Institute (LII), at http://www.law.cornell.edu, which since 1992 has run the most effective browse and search interface for the Code and other primary legal documents. One of LII’s innovations has been creating permalinks to particular paragraphs within the legal documents. Although the Government Printing Office began publishing the Federal Register and Code of Federal Regulations in XML in 2009, and the Law Revision Counsel currently publishes the United States Code in XHTML, errors in the application of the XML formats of those documents have slowed the LII’s progress in making use of those files to create a more richly functioning website[80] (though they have been used to create the Federal Register applications discussed in Chapter 1).

The judicial branch has no such compilation. Case law can only be determined by reading and interpreting court-issued opinions. But while bills, the Statutes at Large, the Federal Register, the United States Code, and the Code of Federal Regulations have been available for free and online for a long time now, court opinions and the documents in the dockets surrounding those opinions are held in two tightly-guarded electronic systems. One is called PACER and is run by the Administrative Office of the United States Courts. The other is a collection of private-sector databases including Westlaw and LexisNexis. These create practical barriers to access. Everyone pays to access these databases (which makes a joke out of PACER’s full name, Public Access to Court Electronic Records). The courts subscribe to Westlaw to have access to their own opinions. Other government agencies subscribe to Westlaw and to PACER, shifting money around the government to access the government’s own record of the law. Lawyers in the private sector subscribe. But of course the general public is left out of the equation. It is a bit of a farce.

Since these documents are generally not subject to copyright or other legal restrictions on redistribution, giving access to the public is legal if only the documents could be obtained. After Aaron Swartz downloaded 19,856,160 pages from PACER through a free trial (saving himself $1.5 million), all free trials were quickly suspended.[81] RECAP, a project out of the Princeton University Center for Information Technology Policy at recapthelaw.org, attempts to create a public repository of court documents by asking lawyers to contribute PACER documents they paid to access into the RECAP public repository. RECAP is a web-browser extension that automates the process of uploading PACER documents to RECAP, and it works not so much because of a technological breakthrough in uploading so much as in human interface design: creating a method that is easy for lawyers to use. (RECAP is PACER spelled backwards.)[82]

Figure 17. Virginia Decoded (vacode.org) is the first state launched in Waldo Jaquith’s State Decoded project. The website shows the state’s laws with tools including pop-up definitions of terms from other parts of the code, suggested citation text, history, cross references, and a link to the corresponding page of Virginia’s official website for each part of the code.

A lot can be done with technology to make the law more accessible. Waldo Jaquith described the goal as “display[ing] local laws and court decisions in a way that provides clarity and context” using “embedded definitions, cross-referencing links, helpful explanations, commenting, tagging, decent design, and humane typography.”[83] Virginia Decoded (vacode.org) is the first state Jaquith has launched in his State Decoded project. Figure 17 shows the site’s pop-up definitions of terms, which are sourced from other parts of the code, suggested citation text, and other tools that help the reader to read and make use of the law. (Jaquith previously created RichmondSunlight, which is a legislative tracking tool for the Virginia state legislature, similar to GovTrack.)

If you like this book, please consider buying a copy:

Support independent publishing: Buy this book on Lulu.

Subscribe to updates to the book:
Google Groups
Read comments or add a comment on this book.