Musings on Wikipedia and Open Source
Recently, I have been using Wikipedia quite a bit. First off, let me say that Wikipedia really is a blast. It is informative, like a regular encyclopedia, but due to the fact that anybody can contribute, there is a lot of funky stuff there that would not be found in a conventional encyclopedia. Surfing wikipedia is just a whole lot of fun!
Of course, a naysayer would likely interject at this point that if you want to have some fun, go to wikipedia, but if you want complete, reliable information, look at Britannica. I have to admit that this would also be my instinctive reaction. However, much to my surprise (and doubtless that of many other people) a recent study that appeared in Nature magazine found that the accuracy of scientific articles was not significantly different in Wikipedia than in Encyclopedia Britannica. Here is an article on that.
Now, aside from a few very specific topics, like java template engines, or preflop strategy in Omaha Holdem poker, and maybe a couple of other things, I am not really qualified to judge the quality of information in Wikipedia. On the other hand, I have noticed that most Wikipedia articles that I come across are fairly well written. I am sensitive to this and consider myself to be a fairly good judge of it.
I have been thinking about various things regarding open source. It seems to me that Wikipedia's strengths (and weaknesses) as compared to a conventional encyclopedia are pretty much those of the open source development model as compared to conventional software development.
One of the revolutionary aspects of free software is that it drastically reduces barriers to entry. Anybody who is interested and motivated can hack the source code. Similarly, anybody can contribute material to Wikipedia. It seems likely that a conventionally published reference book would have some advantage in quality over the wikipedia model. After all, there is a paid editorial staff that would do systematic fact checking and do some needed line editing and so on. However, the advantage of the wiki model is how flexible it is, how quickly it can admit new material and updates and improvements. For example, if a major scientific discovery is made in a field that invalidates previous theories, it is likely that a wiki-based encyclopedia would incorporate this information much more quickly than a conventional publication. So there is a clear trade-off. In a field as rapidly moving as java software, for example, you might well prefer an article about java development tools that is up-to-date but may contain some errors and bits of sloppy prose over a more polished article on the subject that is a couple of years out of date.
Not long ago, I wrote a blog entry about the issues involved in (very) hypothetically joining Jakarta. Though I answered by way of a simile involving King Arthur and the Round Table, I think that the reasons I gave would be clear to anybody reading it. However, an aspect of this that I did not mention there was that I have gradually come to the conclusion that ASF's entire vision of the open source process is incorrect. Certainly, there are reasons to have severe doubts about it. In recent private correspondence, a java developer commented that, once you got beyond surface impressions and actually rooted around, one could see that over 90% of ASF projects were in some kind of state of hibernation or even severe abandonment. He worried that an open source project that he liked and used quite a bit was on the road to becoming an ASF project, and that this would likely be the kiss of death.
I am not so familiar with that many ASF projects, so I cannot vouch for the 90% figure this person gave. However, I think it's clear that there is some kind of systemic problem.
My considered view is that the root of the problem is that ASF wants to project a certain elitist idea, that becoming a committer on an ASF project is some kind of great honor. If you lurk on a given project's mailing list, you will on occasion see them announce, to great fanfare something like: "John Jones has been accepted as a FooBar committer." This kind of thing has always caused me to roll my eyeballs. The subtext is a bit like so-and-so has been admitted as a high priest who may now enter the inner sanctum and touch the holy of holies (which is the code repository presumably.)
So, until they admit you to the holy of holies, you are basically in some kind of supplicant position: "Please sirs, will you look at my patch." And, of course, since the patch in question is typically something that only that person needs at this moment (or maybe other people need it but none of the committers do) what with one thing and another, they typically don't get around to looking at the guy's patch.
Certainly, the FreeMarker project is not run this way. If somebody expresses some interest in hacking the code, I pretty much immediately offer to add them as a developer so that they can commit code. They are added with no great announcement or votes or fanfare. Note that no vetting has occurred here. I will typically have no objective proof of the person's abilities. It is enough for them to say that they are interested in doing something for me to provide CVS access. Basically, we simply assume that somebody is competent until proven otherwise.
Now, as a practical matter, there is not really much problem with people then turning around and committing poor-quality code willy-nilly. Actually, nine times out of ten, somebody expresses interest in doing something and you give them r/w access to CVS and they just never do anything -- good or bad.
So, people committing all kinds of poor quality code is not a common real-world problem. But it can occur. However, even when it does occur, how much of a problem is it? Somebody does something and you can see what they did and modify their work or completely roll it back. This is, in fact, the whole point of a versions repository, is it not? Since it is fairly easy to roll back the code to some previous known state, why should one be so conservative about letting people commit code?
But again, I think the core problem with ASF is this underlying elitist idea. And I think it's wrong; open source is not elitist by nature. It's more like: "If you can do something, then roll up your sleeves and get to it, let's see what you can do." In other words, a person is assumed to be competent until proven otherwise. The ASF approach seems, on the other hand, to assume that you are not competent to collaborate until somehow proven otherwise. What makes matters worse, though, is that it is not really obvious what people who become committers have done to prove their competence. It often seems to be more of a popularity contest with people voting +1 and so on.
So, admittedly, another take on this is that the problem is perhaps not elitism per se, but elitism that is arbitrarily applied. If you're going to be elitist, you should at least have some objective criteria.
The other aspect of this that I think is quite worthy of comment and some analysis is that, as far as I can see, the projects that line up to join ASF are not doing so because they really believe that there is any technical value in it. It is purely to leverage the Apache brand name and thus take advantage of those placement and visibility advantages. Now, if you look at the pages relating to the Apache incubator, they state the supposed technical advantages that getting in with ASF involves, access to world experts in specific domains, experts in running open source projects, etcetera. But again, I do not believe that the OSS projects that want to get on apache.org believe any of this. It's purely for the visibility. For example, when the Struts/Webwork merger was discussed on opensymphony.com forums, the argument used was that by merging with Struts, they would get WebWork's technical superiority along with Struts's "community". I parsed "community" in this case to be code for the marketing advantages. (In general, "community" is a term that ASF people use frequently in an odd and somewhat mystical way.)
I do not recall that anybody suggested that the Struts people had anything to offer technically. Zero, zilch, squat. Of course, similarly, when Leos Literak asked me about the possibility of joining ASF, it was all about publicity, he never at any moment in the discussion suggested that ASF had anything to offer us on a technical level.
Well, all of this does introduce a real element of moral hazard. If you join ASF purely because so many people believe in this "Apache mystique" (that you, yourself do not believe in) then you now have a vested interest in perpetuating said mystique, since, after all, your whole strategy was based on the continuation of this mystique.
As a final note on this "Apache mystique", if what my correspondent said, that over 90% of ASF projects are in a sad state of neglect, a great gap has opened up between hype and reality. A very huge gap indeed. Is such a situation sustainable long-term? You know , it may be analogous to what happens in a financial boom, where something like internet stocks, say, get priced at some level completely out of line with whatever real economic value these things have. Such booms ultimately lead to a day of reckoning, a crash. When exactly such a crash occurs is all in the sort of theory of tipping points, etcetera. Or maybe there won't be any such "crash". Still, my sense of things is that this Apache mystique will ultimate end up being deflated significantly.
Well, nothing is more humbling than trying to predict the future. I have no crystal ball. We shall see.