Citizendium revisited

Just spotted amazing article, how Citizendium built better infrastructure than Wikipedia’s. There lots of fascinating details there, like…

They went with PostgreSQL for a number of reasons, including better scalability. PostgreSQL is an MVCC database. Unlike Wikipedia, Citizendium never has to lock the database for reads and writes. MySQL can do a lot of things quick and replicate them to slave servers, but PostgreSQL excels at complex functions and full features like JOINs and can do complicated categories and full text searches faster than Wikipedia.

If PG can function without locks, it must be definitely more scalable. InnoDB uses mutexes, spinlocks, etc – and that internal locking can be a bottleneck in many cases. Additionally, if a row is updated, a lock on the record is acquired. It is still a question how PG maintains ACID without any locks, got to research on that more.
I’m aware that MySQL isn’t best at full-text search out there – but Wikipedia uses Lucene for full-text search, so it is somewhat strange to hear that Citizendium platform is faster in that regard. And… I’m not sure where JOIN performance is really faster there – especially when we do lots of covering-index based joins. Probably the key word there is ‘complex’, though I’m not sure what that means :-)
The first reason not to use MySQL was:

First, to be different from Wikipedia.

Indeed, I always support critical thinking! Though this one:

Finally, we felt from reading various mailing lists over mediawiki development that mediawiki was hitting the ceiling of the features MySQL can provide as a backend.

IIRC that came from single post on single mailing list from someone who is not running Wikipedia backend. Mhm.
Of course, their monthly traffic is equal to our single minute traffic, so some views might differ…

.. some thoughts on Citizendium

Open-source communities have quite a lot of antagonism against their open-source ‘rivals’, instead of seeing as partners against Greater Evils. I imagine that bootstrapping a project like Citizendium is a huge task, so I followed some of the discussions in their forums:

  • It’s a nightmare. Only Mozilla Thunderbird gets more disrespect from me. – Lead developer Jason describes the software they use, mediawiki.
  • Can we like anonymize some stuff and submit Mediawiki to WorseThanFailure? – Technical liasion [sic] Zachary suggests.

Of course, being forced to run open-source package from greatest ‘rival’ is pain oh pain. Citizendium team even forked software, called it CaesarWiki.
This is how improvements to the fork are described:

Well, hypothetically, we can do whatever we want in terms of improving MediaWiki, including working on the difference engine.
However, I think that it’s more likely that any changes in that area will filter down from work done by the MediaWiki team.
They have a lot more developer time (in developer-hours/month) and a lot more expertise with MediaWiki.

Of course, having paid lead developer not understand core principles of how software functions (disrespect, remember?) doesn’t help with real improvements. Of course, half a year ago, big work was ahead:

Ideally, I would like to rewrite mediawiki from the ground up in OO style. Since that may not work well, the best way is to wrap it in a bow and let the “present” develop into something pretty over time.

Xoops was given as an example of package that scales, has some security, even uses caching, so integrating with MediaWiki would make it scale. Thats sure way forward. Of course, one of biggest mistakes Wikipedia folks has made is LAMP choice:

To not box ourself in like Wikipedia has done with Mediawiki, PHP and MySQL, we need to pursue modular, easy to use and easy to maintain and update solutions. No one needs network and system admins spinning dinner plates on sticks all day.

It is quite difficult to understand how people who never talked to us know about our operations that much. Back when this was written, Wikipedia had one full-time employee working on the system, few others did the work whenever they (we) wished, and that usually was creative (of course, sometimes artistic) work. Anyway, to run away from evil MySQL to PG, this set of arguments was used:

Disadvantage: no years of heavy use to test it. Advantage: fewer workarounds, easier to scale overall, incredibly knowledgeable community ready to help out.

Of course, at Wikipedia we failed to scale. Now what made me slightly envious, is discussion about security and operating personnel – having a pool of developers scattered around the world, with floating 24/7 schedule is priceless, we really can’t afford that at Wikipedia – at one moment all of us were in Europe, now just Brion is sitting in Florida (what is not that far away either).

Anyway, though I believe in Wikipedia evolution more than in Citizendium revolution, I wouldn’t reject advises – the project may be quite interesting, and if content can be reused on other projects, it just adds value to the Web. Probably we’re rookies in software engineering, but there has been long path to build Wikipedia platform. Some of us learnt technologies used specifically for the project. I’m not sure we did earn the disrespect we’re getting, but I still think that antagonism is harming Citizendium, not us.

%d bloggers like this: