Revision control: A fiefdom and a federation

This is a comparison of two web sites that are experiments in self-governance: Wikipedia and GitHub. As producers of content, both are immense successes, but the governance models (i.e., the rules and process by which content is written and kept) have grown to be very different. To a great extent, we can trace most of the differences in governance to a key difference in setup: one allows forking and the other doesn’t.

Changes and branches

Before I get to all that, we’ve got to start by thinking about documents the way computer geeks think about documents: as a series of changes. You started with a blank screen, then

  • added a first sentence,
  • then added a second,
  • then went back and changed the wording of the first sentence,
  • then added your name,
  • then started writing the first sentence of the abstract,
  • then changed your name to boldface,
  • then added two more sentences to the abstract,

et cetera.

With enough changes, you wound up with the document you have today, and we can think of the document in its current state as the bundle of changes that occurred to make it look like it does.

Given that any modern legislation is a modification of the existing law, legislation works like this. It’s easy to find bills or new rules that are explicitly written in the form of such-and-such section of the US Code shall be revised as follows….

We could have multiple threads. In one thread, you made changes to produce a section about the implications of your thesis for the future, in another thread, that didn’t happen and your changes added historical background. Or you sent a copy of the paper to colleagues, and you made your changes and your colleagues made theirs; if your colleague were Vladimir Nabokov, he would add a parenthetical before tweaking line 404, [And here time forked.] Time can un-fork, because you can grab some or all of the changes your colleagues made and make them in your copy, thus merging the parts of your colleague’s work that you like into your own verson. Or maybe you and your coauthor break up, and now you have papers that started from the same root, had a number of changes in common, but eventually branched into two distinct streams of changes.

Managing all these changes and merging them across forks is typically a logistic pain, but there’s some stellar software to make this happen. The favorite of late is git, which was originally written for managing the Linux kernel. If I may make a book plug, I cover git in a chapter of my latest book. Or ask Google for git tutorials. In fact, git is so popular that there’s a web site built on it, github.com, which is currently hosting on the order of half a million projects.

Most of the projects are code, but if you are writing a document (in plain text or LaTeX), you can collaboratively change it into something beautiful on GitHub. The Sandy Hook Project uses Github to disseminate data on guns in the USA.

Wikipedia

Wikipedia, which needs no introduction, is also a revision control system with a web site. Every page, like the one on Anarchy, is the sum of a series of changes, which you can view from the history tab of the given page. You can see exactly who made what tweaks when.

If you use RSS, you can subscribe to a page, and get real-time notifications of what changes are being made. The notifications are pretty legible, showing the old and new versions in a neat two-column format.

These notifications are an essential part of Wikipedia, because Wikipedia is the encyclopedia that anybody can vandalize. The history of any reasonably popular page is filled with regular incidents where somebody deleted the entire page and replaced it with a commentary on penises, posted joke photos, blew spam everywhere, and so on. Those individuals who are following the page can immediately respond and revert to a prior version. If it weren’t for these people who watch the changes on a page, Wikipedia would become a useless pile of slop within a day.

In fact, we can expect that those people who are watching a page will be evaluating every change for whether it is a change he or she approves of. At which point, the system is beginning to look rather feudal: there is nobody telling Wikipedians at large what to write, no ruler of all pages, but on any given page, there is a small committee of invested parties who have veto power over any change to what is written.

Timothy B Lee describes Wikipedia as “a community of editors. As a reader, you’re just an invited guest.” Some page-watchers are pretty loose and only revert clear vandalism; some have strong opinions and will revert any change that doesn’t fit his, her, or its world view (we can expect that any given corporation has somebody watching the corporation’s Wikipage). It’s easy to find stories of knowledgeable people who made good faith changes and saw them reverted. There is only one page for a given topic, so if you disagree with the most active editors, then you’re stuck, because there is no mechanism by which you can fork the page.

[Wikipedia does, in a sense, have multiple versions of the same document: different languages. For example, the Arabic page on Israel has some snippy side comments which would never survive on the English version of the page. Maybe see also this wiki-summary of a paper on collaboration in wikis.]

Github

I have a project on Github I’ve been working on for a few years. You can’t change my version of the project, but you can fork it, making a copy of the sequence of revisions up to today, and then make new changes on your copy as you see fit. Some projects have hundreds of forks of the same code base. If I like some set of your changes, then I can apply them to my copy. We thus get collaboration done by making changes to our personal forks and then copying across forks those changes where it makes sense to do so.

[Did you buy my book since the plug up there? You can fork the exercises from two repositories.]

Which version of the project is authoritative? Who knows. The person who started the project typically has more social clout, and there may be some links pointing to the first version, but that’s about it: otherwise, every fork is largely on equal footing with every other.

The physical metaphor here would be lots of people in a room working on their own drafts of a document. Some people are constantly checking on each other and suggesting changes and reconciliations; some are working alone. At the end of the day, there are several versions of the document in the room, and the group somehow decides on a mechanism to reconcile them all into a final document.

G v W

The physical metaphor for a Wikipedia page would be a committee in a single room, all discussing a single document. For every change to the document, the committee arrives at a consensus. The implications are as above: things become a mess unless somebody is watching at all times, fiefdoms (benevolent or otherwise) are largely inevitable, and changes can be slow and difficult to make, because they require the consensus of a potentially large number of veto-holding actors.

If you’re reading this, you’ve spent some time asking questions about the process by which legal code is written:

  • Does the process foster or discourage a tyranny of the majority?
  • If there are surprising changes in conditions, or even a slow trend from one prevalent worldview to another, can the document change easily?
  • Is the social structure of the revision system, like who is involved and how, explicit and transparent?
  • Is it efficient, or do debates go in circles and changes get made/unmade/remade repeatedly?
  • If there are multiple directions in which a document could go, are all fully explored, or does one vision take an early lead, closing off alternate avenues?
  • Do incumbents have an unfair advantage simply because they were there first?

I don’t think Wikipedia rates well on these scales. As a document, Wikipedia is a miracle of modern times, and I use it (and trust its content) daily; as a governance framework for generating documents, I don’t know if I’d want to emulate it elsewhere.

GitHub’s fork-before-modifying model can’t have a lot of Wikipedia’s problems: it’s hard to game, spam, micromanage, or bully. It pushes diversity, and lets authors fully explore different perspectives on the same document. But the obvious difference is that we wind up with several different branched versions of a document, rather than a single canonical version. If we need to have exactly one final version, we still need to take a final vote or have a reconciliation committee to do a final merge.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s