Citizendium revisited

Just spotted amazing article, how Citizendium built better infrastructure than Wikipedia’s. There lots of fascinating details there, like…

They went with PostgreSQL for a number of reasons, including better scalability. PostgreSQL is an MVCC database. Unlike Wikipedia, Citizendium never has to lock the database for reads and writes. MySQL can do a lot of things quick and replicate them to slave servers, but PostgreSQL excels at complex functions and full features like JOINs and can do complicated categories and full text searches faster than Wikipedia.

If PG can function without locks, it must be definitely more scalable. InnoDB uses mutexes, spinlocks, etc – and that internal locking can be a bottleneck in many cases. Additionally, if a row is updated, a lock on the record is acquired. It is still a question how PG maintains ACID without any locks, got to research on that more.
I’m aware that MySQL isn’t best at full-text search out there – but Wikipedia uses Lucene for full-text search, so it is somewhat strange to hear that Citizendium platform is faster in that regard. And… I’m not sure where JOIN performance is really faster there – especially when we do lots of covering-index based joins. Probably the key word there is ‘complex’, though I’m not sure what that means :-)
The first reason not to use MySQL was:

First, to be different from Wikipedia.

Indeed, I always support critical thinking! Though this one:

Finally, we felt from reading various mailing lists over mediawiki development that mediawiki was hitting the ceiling of the features MySQL can provide as a backend.

IIRC that came from single post on single mailing list from someone who is not running Wikipedia backend. Mhm.
Of course, their monthly traffic is equal to our single minute traffic, so some views might differ…

6 thoughts on “Citizendium revisited”

  1. Hey – this is Brian Boyko – the guy who wrote the article. First, thanks for linking to it. Glad you liked it.

    Second, I was wondering if you’d also like to talk about your development at Wikipedia and MySQL for the same blog? Feel free to e-mail me at the address I’ve provided.

    Thanks!

    — Brian Boyko

  2. Domas,

    I’m not sure in which sense do use you use “amazing” here, if it is equivalent of lame I can’t agree more. It is lame from the first lines saying unlike MySQL PostgreSQL is MVCC database.

    I would expect that to be paid for or friendly article mainly focusing on advertising the project.

    Come one – how can you compare architectures of projects so wastly different in traffic. 100.000 visitors _per month_ is probably something Wikipedia beats withing an hour.

  3. I don’t mean to be disrespectful to another free-knowledge project, but since their launch was laden with bile about Wikipedia, my conscience is eased somewhat. Just because something works on Citizendium, does not mean that it would work better than what other people use in their situations. For example: a sharpened stick will fend off a rat. That does not necessarily mean that the same sharpened stick will fend off a tank better than the rocket launchers that soldiers use.

  4. No database can provide MVCC without locking or versioning. I think the comment was explicitly about “locking the database”, and MyISAM simply locks on table level for writing, to ensure consistency. PostgreSQL supports row-level locks, like InnoDB, so I guess that’s the difference.

    Where PostgreSQL really excels is their index management that apparently is faster than MySQLs especially for GiST-Indexes, but the strong point is that it’s only faster for some configurations.

    In simple terms: as soon as you have to scale out by splitting tables and partitioning your data effectively, the difference between MySQL and PostgreSQL is directly related to the skill of your administrative team and has no direct relation to the database system you’re using and whoever wrote this article has no idea what he’s talking about.

  5. Brian, had a good laugh. And I do describe the architecture here already :)
    Peter, ‘amazing’ was probably sarcasm. And I wrote already, their monthly traffic is a minute for Wikipedia. I’m not comparing architectures, just design choices are strange.
    David, of course different projects have different needs. The problem with CZ is that they talk against many things Wikipedia uses, but they have absolutely no clue about it.
    Joh, I’m completely aware of implementation details in PG, MyISAM and InnoDB – and of course I agree, someone had no idea what he was talking about. Wikipedia is fully InnoDB shop, so talking about locking the way article does is complete bullshit.

Comments are closed.

%d bloggers like this: