Conference notes

Cool ideas from the conference:

  1. Having same binary logs with same positions on slaves as on masters. This allows switching the database master a DNS-change (or IP takeover) operation. — Google
  2. Reading replication logs and prefetching data in parallel on slaves before replication events get executed. This way data is preheated for upcoming changes and serial updating thread works much faster. Especially useful on slightly delayed slaves. — Youtube
  3. For generating unique IDs for distributed environment use separate set of servers, without any replication, but with auto-increment offsets. This can be separate from core database and quite efficient. — Flickr
  4. Let’s get some beer. — Colleagues at MySQL AB

One of core messages I was trying to spread was “Relax. World is not going to end in case you lose a transaction.” I’m not sure how cool it was, but some nice folks out there said it was expiring. In many cases running a project has to be fun first, and most motivating targets should be the priority.
There still were ideas that had counter-arguments (of course, every situation may have different needs). One of discussions I bumped into was about using big services Out There (such as Amazon S3) instead of building your own datacenters – I didn’t end up convinced, but of course it is interesting to investigate if really costs can be lower.
Some more notes:

  • At the Flickr presentation there was someone asking why is Flickr using Yahoo’s search backend instead of rolling out anything Lucene-based. Lucene seemed to be the buzzword always in the air – though I had nice conversations with Sphinx developer too. Still, if I’d be commercial entity owned by Yahoo, I’d really reuse top-notch technology developed in other Y! departments, rather than use open source project, created by… Yahoo’s employee in free time.
  • Every time I heard someone was using NetApp Filers, I cried. Probably it is very good technology, but last time I checked prices, it was quite expensive… And someone mentioned it becomes slightly pain to ensure all the business continuity.
  • Open Source Systems were demonstrating pretty cool (literally too) dual-motherboard servers – promised lower prices and much lower energy consumption. Interesting.
  • Dolphin interconnect people said 10g-ethernet is not solving the network delay problem, and they’ll live for a while.
  • HP people were showcasing blade technology, though they avoided filling the enclosure with blades and loading them – would be quite a noise generator in the expo hall. We did discuss the unfair listprices by major hardware vendors – industry sucks at this place.
  • Heikki joked, people laughed, even few times. At the ‘clash of DB gurus’ Heikki mentioned he’d like to see Wikipedia still running on InnoDB after 20 years. It made me wonder what kind of data architectures will people use in 20 years.
  • Nobody solved the speed of light problem yet.
  • .org pavillion showcased the core products out there, especially in web publishing (WordPress, Drupal, Joomla), and workflow (Eventum, OTRS, Bugzilla, dotproject).
  • All scaling talks generally choose standard solutions and spend engineering on them. Didn’t see anything breathtaking, but of course, that may be still hidden part of technology.
  • I’d like to come back next year again. Again.

Wikipedia: site internals, etc (the workbook)

There still are details (and even complete subsystems) to be documented more thoroughly, but this is the workbook I presented at MySQL Conference. I’ve heard comments that the book has ‘hypothetical’ situations, but generally every bit is there represents real practice we’re having.

The talk did drift to some bits of information that may not be there (or did go much deeper), as well as the book covers some parts of operations that were not discussed at the tutorial talk. Anyway, I hope both session and tutorial has some value.

Here’s the file: Wikipedia: Site internals, configuration, code examples and management issues (the workbook). For now it is PDF, but I hope to transform it into properly evolving public document.

Update: Another good presentation on Wikipedia technology by Mark: Wikimedia Architecture

Writing a book (or preparing for MySQL Conference)

I already announced about coming to MySQL Conference, but I didn’t realize preparing for it will take that much time. Last year I had just regular session about Wikipedia’s scaling and did feel that it is somewhat difficult to squeeze that much information into less than one hour. This year I opted in for 3h session (with short break in the middle), and instead of few slides with buzzwords on them I worked on workbook-like material to talk and discuss about.
Presentations are always easy, I have to admit I’ve made quite a lot of my slides an hour before actual talks. Now I realized that writing a workbook ends up to be a book, and books are not written in single day… Full disclosure: I looked at last year’s presentation files and blog posts for preparation of the talk, but still, things have changed, both in technology and in numbers. We have far more visitors (ha, >30kreq/s instead of 12kreq/s!), more content, slightly more servers and less troubles :-)
Today I’ve delivered the paper for printing (dead tree handouts for session attendees!), but there are many ideas already what to append or to extend, so this will end up being perpetual process of improving. Let’s hope tutorial attendees will bring their laptops for updated digital handouts.
Of course, the good part is that the real work will be over after first day and I’ll be able to enjoy other sessions & social activities. If only I survive the staff party..

Bumping up the version

As Wikipedia is part of the Web 2.0 revolution, there has been already pressure to upgrade to Web 3.0. I personally believe we should take more drastic approach, and go directly for Web 4.0.
This Albert Einstein quote now rings in my head:

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.

Avoid: cookies!

One project out there was trying to mimic some of Wikipedia’s stuff and ended up with caching layer in front of application. Now, it wasn’t too efficient, and a quick glance at the setup immediately revealed what was wrong. Every anonymous user got a tracking cookie, which of course broke all Vary: Cookie HTTP caching… Big picture education should be mandatory in environments :-)

.. some thoughts on Citizendium

Open-source communities have quite a lot of antagonism against their open-source ‘rivals’, instead of seeing as partners against Greater Evils. I imagine that bootstrapping a project like Citizendium is a huge task, so I followed some of the discussions in their forums:

  • It’s a nightmare. Only Mozilla Thunderbird gets more disrespect from me. – Lead developer Jason describes the software they use, mediawiki.
  • Can we like anonymize some stuff and submit Mediawiki to WorseThanFailure? – Technical liasion [sic] Zachary suggests.

Of course, being forced to run open-source package from greatest ‘rival’ is pain oh pain. Citizendium team even forked software, called it CaesarWiki.
This is how improvements to the fork are described:

Well, hypothetically, we can do whatever we want in terms of improving MediaWiki, including working on the difference engine.
However, I think that it’s more likely that any changes in that area will filter down from work done by the MediaWiki team.
They have a lot more developer time (in developer-hours/month) and a lot more expertise with MediaWiki.

Of course, having paid lead developer not understand core principles of how software functions (disrespect, remember?) doesn’t help with real improvements. Of course, half a year ago, big work was ahead:

Ideally, I would like to rewrite mediawiki from the ground up in OO style. Since that may not work well, the best way is to wrap it in a bow and let the “present” develop into something pretty over time.

Xoops was given as an example of package that scales, has some security, even uses caching, so integrating with MediaWiki would make it scale. Thats sure way forward. Of course, one of biggest mistakes Wikipedia folks has made is LAMP choice:

To not box ourself in like Wikipedia has done with Mediawiki, PHP and MySQL, we need to pursue modular, easy to use and easy to maintain and update solutions. No one needs network and system admins spinning dinner plates on sticks all day.

It is quite difficult to understand how people who never talked to us know about our operations that much. Back when this was written, Wikipedia had one full-time employee working on the system, few others did the work whenever they (we) wished, and that usually was creative (of course, sometimes artistic) work. Anyway, to run away from evil MySQL to PG, this set of arguments was used:

Disadvantage: no years of heavy use to test it. Advantage: fewer workarounds, easier to scale overall, incredibly knowledgeable community ready to help out.

Of course, at Wikipedia we failed to scale. Now what made me slightly envious, is discussion about security and operating personnel – having a pool of developers scattered around the world, with floating 24/7 schedule is priceless, we really can’t afford that at Wikipedia – at one moment all of us were in Europe, now just Brion is sitting in Florida (what is not that far away either).

Anyway, though I believe in Wikipedia evolution more than in Citizendium revolution, I wouldn’t reject advises – the project may be quite interesting, and if content can be reused on other projects, it just adds value to the Web. Probably we’re rookies in software engineering, but there has been long path to build Wikipedia platform. Some of us learnt technologies used specifically for the project. I’m not sure we did earn the disrespect we’re getting, but I still think that antagonism is harming Citizendium, not us.

HTTP 2.0

Tim discovered inefficiency in communication between squids and backend servers (and eventually, clients) – the problem was lack of Content-Length: header for dynamically generated content. Though HTTP/1.1 uses chunked encoding inside keep-alive connections, HTTP/1.0 completely relies on Content-Length:, so lack of it simply forces connections to close (and have expensive reopens afterwards).

Generally, HTTP/1.0 lacked a footer – additional metadata which could be calculated after whole request. So add Content-Length footer, content compression which smoothes the content bits, and you’ve got HTTP 2.0 – with headers, footers, and rounded corners.

Ah, and solution could be various – from increasing compression buffers, to overhauling whole output buffering code. It should make the site faster anyway – good job, Tim.

Five minutes of MediaWiki performance tuning

MediaWiki is quite complex package, and some even trivial features are not the ones that should be enabled on sites having more load. Though it is quite modular, still lots of code has to be executed, and some of it requires additional steps to be done. Continue reading “Five minutes of MediaWiki performance tuning”

MySQL Conference 2007: Piggyback riding Wikipedia again. \o/

This year I’m coming to MySQL Conference again. Last year it was marvelous experience, with customers, community and colleagues (CCC!) gathering together, so I didn’t want to miss it this year at any cost :-)

This year instead of describing Wikipedia internals I’ll be disclosing them – all important bits, configuration files, code, ideas, problems, bugs and work being done through whole stack – starting with distributed caches in front, distributed middle-ware somewhere in the middle and distributed data storage in the back end. It will take three hours or so – bring your pillows. :)