Board again (perhaps)

Tomorrow voting for Wikimedia Foundation Board of Trustees Election starts – and Yours truly is a candidate.

You can find most of my views on various issues in our question pages (I was somewhat boiling when answering the What will you do about the WMF mishandling it’s funding? one – it probably takes great effort to phrase such a bad question, and so easy to answer it :), as well as Wikipedia Signpost ‘interview’.

I was appointed to the Board back in January 2008, after holding various other volunteer (at some point in time – ‘officer’) positions within the organization since 2004 – and brought in the core technology and operational efficiency skill set there. The appointment was supposed to be somewhat temporary, but board restructure appeared to be much longer process than we expected – both the chapters part, and nomination committee work. As a community member, after the restructure I was in ‘community-elected’ seat, though I never participated in any election – so that wasn’t too fair to the actual community, need to fix that :)

So, even though I wasn’t too visible to actual community (people would notice me mostly when things go wrong, and I’m not in best mood then, usually :-), I feel that the values I’ve worked on, evangelized and supported for all these years – efficiency and general availability of our projects – can win mindshare not only of our read-only users I work mostly for, but also eligible voters.

And I do think, that internal technology expertise has to be represented on board, as things we’ve been doing, and methods we’ve been using, are very much unique in the technology world. Oh, and somewhere I mentioned, our technology spending is close to 50%, that has to be represented too :-)


So, we had a major embarrassment last night. It consisted of multiple factors:

  • We don’t have parallelism coordinator for our most cpu-intensive task at Wikipedia, so it can work on same job in ten, hundred, thousand threads across the cluster at the same time.
  • Some parts of our parsing process ended up extremely CPU-intensive, and that happened not in our code, but in ‘templates’, that are in user-space. We don’t have profiling for templates, so we can just guess which one is slow, which one is fast, nor their overall aggregates.
  • Some parts of pages are extremely template-heavy, making page rendering cost a lot (e.g. citations – see this discussion).
  • In order to avoid content integrity race conditions, editing process releases locks and invalidates objects early, separated from ‘virgin parse’ which populates caches.
  • It takes quite some time to refill the cache, as rendering is CPU-bound for quite a while in certain cases.
  • During that short time when caches are empty, stampede of users on single article causes lots of redundant work across the cluster/grid/cloud.
  • Michael Jackson article on English Wikipedia alone had a million views in one hour

So, in summary, we had havoc in our cluster because stampede of heavy requests between cache purge and cache population was consuming all available CPU resources, mostly working on rendering references section on Michael Jackson article.

Oh well, quick operations hack looked like this:

Index: ParserCache.php
--- ParserCache.php	(revision 52088)
+++ ParserCache.php	(working copy)
@@ -63,6 +63,7 @@
  if ( is_object( $value ) ) {
    wfDebug( "Found.\n" );
    # Delete if article has changed since the cache was made
    // temp hack!
+   if( $article->mTitle->getPrefixedText() != 'Michael Jackson' ) {
    $canCache = $article->checkTouched();
    $cacheTime = $value->getCacheTime();
    $touched = $article->mTouched;

It is embarrassing, as actual pageview count was way below our usual capacity, whenever we have problems is because of some narrow expensive problem, not because of overall unavoidable resource shortage. We can afford much more edits, much more pageviews. We could have handled this load way better if our users wouldn’t be creating complex logic in articles. We could have handled this way better, if we had more aggressive redundant job elimination.

Thats the real story of operations, though headlines like “High profile event brought down Wikipedia” may sound nice, the real story is “shit happens”.

on tools and operating systems

Sometimes people ask why do I use MacOSX as my main work platform (isn’t that something to do with beliefs?). My answer is “good foundation with great user interface”. Though that can be treated as “he must like unix kernel and look&feel!”, it is not exactly that.

What I like is that I can have good graphical stable environment with some mandatory tools (yes, I used OS-supplied browser, mail, etc), but beside that maintain the bleeding edge open-source space (provided by MacPorts).

Also what I like, is OS-supplied development and performance tools. DTrace included is awesome, yes, but Apple did put some special touch on it too. This is visualization environment for dtrace probes and other profiling/debugging tools:

Even the web browser (well, I upgraded to Safari4.0 ;-) provides some impressive debugging and profiling capabilities:

Of course, I end up running plethora of virtual machines (switching from Parallels to VirtualBox lately), but even got a KDE/Aqua build (for kcachegrind mostly). I don’t really need Windows apps, and I can run ‘Linux’ ones natively on MacOSX, and I can run MacOSX ones on MacOSX.

There’s full web stack for my MediaWiki work, there’re dozens of MySQL builds around, there’re photo albums, dtrace tools, World of Warcraft, bunch of toy projects, few different office suites, Skype, NetBeans, Eclipse, Xcode, integrated address books and calendars, all major scripting languages, revision control systems – git, svn, mercurial, bzr, bitkeeper, cvs, etc.

All that on single machine, running for three years, without too much clutter, and nearly zero effort to make it all work. Thats what I want from desktop operating system – extreme productivity without too much tinkering.

And if anyone blames me that I’m using non-open-source software, my reply is very simple – my work output is open-sourced.

I loved Encarta

That happened long before Wikipedia. I loved Encarta. Well, before Encarta, I used to read this thing a lot:

But then Encarta arrived and I loved it. It did fit into single CD and didn’t take too much space on disk. I could look up all these articles in it, without having to use expensive dialup, fast. I remember my school buddies coming over and watching those tiny movies in it. I could rip it off for my school works, and look incredibly smart (now people rip off Wikipedia and don’t get too much credit for that :).

It is dead.

People on the interwebs suggest that employees at Wikipedia and Encyclopaedia Britannica will be throwing parties tonight. Oh well, Wikipedia is already up to date about this. Every encyclopedia out there was an inspiration for Wikipedia, more so than any technology or “web-two-oh” hype. There’s not much joy seeing good things die.

Ten years ago I imagined, that once I have my own home, I’ll have a place to put a full set of dead-tree Britannica, like my parents had “Lithuanian soviet encyclopaedia”. Wikipedia changed my plans (now there’re two flat panels staring at Wiki, inside and outside), but it seems it already is changing the world around it way more. RIP Encarta. You were inspiring, and really too young to die. If it was us, we didn’t mean it, really. By the way, that content of yours, I’d be glad to see it free. *wink*

Rasmus vs me

Rasmus (of PHP fame) and me exchanged these nice words on Freenode’s #php (when discussing some PHP execution efficiency issues):

<Rasmus_> if that is your bottleneck, you are the world's best
          PHP developer
<Rasmus_> domas: then you are writing some very odd code.
          I think you will find you spend most of your time in
          syscalls and db stuff

<domas> Rasmus_: I can tell you're the best database developer, if
        you spend most of your time in db stuff :)

You can immediately see different application engineering perspectives :)

Tim is now vocal

Tim at the datacenter
Tim is one of most humble and intelligent developers I’ve ever met – and we’re extremely happy having him at Wikimedia. Now he has a blog, where the first entry is already epic by any standards. I mentioned the IE bug, and Tim has done thorough analysis on this one, and similar problems.

I hope he continues to disclose the complexity of real web applications – and that will always be a worthy read.

Crashes, complicated edition

Usually our 4.0.40 (aka ‘four oh forever’) build doesn’t crash, and if it does, it is always hardware problem or kernel/filesystem bug, or whatever else. So, we have a very calm life, until crashes start to happen…

As we used to run RAID0, a disk failure usually means system wipe and reinstall once fixed – so our machines all run relatively new kernels and OS (except some boxes which just refuse to die ;-), and we’re usually way more ahead than all the bunch of conservative RHEL users.

We had one machine which was reporting CPU/northbridge/RAM problems, and every MySQL crash was accompanied by MCEs, so after replacing RAM, CPU and motherboard itself, we just sent the machine back to service, and asked them to do whatever it takes to fix it.

So, this machine, with proud name of ‘db1’ comes and after entering the service starts crashing every day. I reduced InnoDB log file size, to make recovery faster, and would run it under ‘gdb’. Stacktrace on crash pointed to check-summing (aka folding) bunch of functions, so initial assumption was ‘here we get memory errors again’. So, for a while I thought that ‘db1’ needs some more hardware work, and just left it as is, as we were waiting for new database hardware batch to deploy and there was a bit more work around.

We started deploying new database hardware, and it started crashing every few hours instead of every few days. Here again, reduced InnoDB transaction log size and gdb attached allowed to trap the segfault, and it was pointing again to the very same adaptive hash key calculation (folding!).

Unfortunately, it was non-trivial chain of inlined functions (InnoDB is full of these), so I built ‘-g -fno-inline’ build, and was keenly waiting for a crash to happen, so I could investigate what and where gets corrupted. It did not. Then I looked at our zoo just to find out we have lots of different builds. On one hand it was a bit messy, on another hand, it showed few conclusions:

  • Only Opterons crashed (though there’re like three year gap between revisions)
  • Only Ubuntu 8.04 crashed
  • Only GCC-4.2 build crashed

After thinking a bit that:

  • We have Opterons that don’t crash (older gcc builds)
  • Xeons didn’t crash.
  • We have Ubuntu 8.04 that don’t crash (they either are Xeons or run older gcc-4.1 builds)
  • We have GCC-4.2 builds that run nice (all – on Xeons, all on 8.04 Ubuntu).

The next test was taking gcc-4.1 builds and running them on our new machines. No crash for next two days.
One new machine did have gcc-4.2 build and didn’t crash for few days of replicate-only load, but once it got some parallel load, it crashed in next few hours.

I tried to chat about it on Freenode’s #gcc, and I got just:

noshadow>	domas: almost everything that fails when
		optimized (as inlining opens many new
		optimisation possibilities)
noshadow>	i.e: const misuse, relying on undefined
		behaviour, breaking aliasing rules, ...
domas>		interesting though, I hit it just with
		gcc 4.2.3 and opterons only
noshadow>	domas: that makes it more likely that
		it is caused by optimisation unveiling
		programming bugs

In the end I know, that there’s programming bug in ancient code using inlined functions, that causes memory corruption in multithreaded load if compiled with gcc-4.2 and ran on Opteron. As for now it is our fork, pretty much everyone will point at each other and won’t try to fix it :)

And me? I can always do:

env CC=gcc-4.1 CXX=g++-4.1 ./configure ... 

I’m too lazy to learn how to disassemble and check compiled code differences, especially when every test takes few hours. I already destroyed my weekend with this :-) I’m just waiting for people to hit this with stock mysql – would be one of those things we love debugging ;-)