Wikipedia: site internals, etc (the workbook)

There still are details (and even complete subsystems) to be documented more thoroughly, but this is the workbook I presented at MySQL Conference. I’ve heard comments that the book has ‘hypothetical’ situations, but generally every bit is there represents real practice we’re having.

The talk did drift to some bits of information that may not be there (or did go much deeper), as well as the book covers some parts of operations that were not discussed at the tutorial talk. Anyway, I hope both session and tutorial has some value.

Here’s the file: Wikipedia: Site internals, configuration, code examples and management issues (the workbook). For now it is PDF, but I hope to transform it into properly evolving public document.

Update: Another good presentation on Wikipedia technology by Mark: Wikimedia Architecture

11 thoughts on “Wikipedia: site internals, etc (the workbook)”

  1. Great – thanks very much for posting this extremely valuable information on scalable architecture. It’s precisely the kind of information I’ve been looking for, and I’m sure it will be invaluable to many others too.

    I’m interested to see that external storage of files is handled by small mysql replication setups on the application servers – was this chosen over some kind of distributed filesystem for any reasons other than simplicity?

  2. External storage of media is not handled by mysql replication setups, just revision texts – they have to be accessed by application, so using common layers (Database class, LoadBalancer, etc) are easier than to using ‘distributed file system’.

    The biggest problem with distributed file systems, is that they don’t really exist in opensource world (and MogileFS isn’t that distributed either, nor easy to maintain/extend).

    If there’d be efficient distributed storage, we would use it.
    And every Wikipedia tech person knows how to handle MySQL replication ;-)

  3. > And every Wikipedia tech person knows how to handle MySQL replication ;-)

    I don’t!

  4. Hi,

    The pdf is useful! Thanks!

    I have some questions,

    1. Will wikipedia’s config files such as httpd.conf, my.cnf, squid.conf be shared in the future? [of coz sensitive info such as IP can be masked!],
    I really want to look into it to learn the optimization being used.

    2. Are there any URL I can download the internal scripts being used in wikipedia?

    THanks.

  5. The biggest problem with distributed file systems, is that they don’t really exist in opensource world (and MogileFS isn’t that distributed either, nor easy to maintain/extend).

    what about lustre ?

Comments are closed.

%d bloggers like this: