wikitech – Page 2 – domas mituzas

I loved Encarta

That happened long before Wikipedia. I loved Encarta. Well, before Encarta, I used to read this thing a lot:

But then Encarta arrived and I loved it. It did fit into single CD and didn’t take too much space on disk. I could look up all these articles in it, without having to use expensive dialup, fast. I remember my school buddies coming over and watching those tiny movies in it. I could rip it off for my school works, and look incredibly smart (now people rip off Wikipedia and don’t get too much credit for that :).

It is dead.

People on the interwebs suggest that employees at Wikipedia and Encyclopaedia Britannica will be throwing parties tonight. Oh well, Wikipedia is already up to date about this. Every encyclopedia out there was an inspiration for Wikipedia, more so than any technology or “web-two-oh” hype. There’s not much joy seeing good things die.

Ten years ago I imagined, that once I have my own home, I’ll have a place to put a full set of dead-tree Britannica, like my parents had “Lithuanian soviet encyclopaedia”. Wikipedia changed my plans (now there’re two flat panels staring at Wiki, inside and outside), but it seems it already is changing the world around it way more. RIP Encarta. You were inspiring, and really too young to die. If it was us, we didn’t mean it, really. By the way, that content of yours, I’d be glad to see it free. *wink*

Rasmus vs me

Rasmus (of PHP fame) and me exchanged these nice words on Freenode’s #php (when discussing some PHP execution efficiency issues):

 
<Rasmus_> if that is your bottleneck, you are the world's best
          PHP developer
<Rasmus_> domas: then you are writing some very odd code.
          I think you will find you spend most of your time in
          syscalls and db stuff

<domas> Rasmus_: I can tell you're the best database developer, if
        you spend most of your time in db stuff :)

You can immediately see different application engineering perspectives :)

Tim is now vocal

Tim is one of most humble and intelligent developers I’ve ever met – and we’re extremely happy having him at Wikimedia. Now he has a blog, where the first entry is already epic by any standards. I mentioned the IE bug, and Tim has done thorough analysis on this one, and similar problems.

I hope he continues to disclose the complexity of real web applications – and that will always be a worthy read.

Crashes, complicated edition

Usually our 4.0.40 (aka ‘four oh forever’) build doesn’t crash, and if it does, it is always hardware problem or kernel/filesystem bug, or whatever else. So, we have a very calm life, until crashes start to happen…

As we used to run RAID0, a disk failure usually means system wipe and reinstall once fixed – so our machines all run relatively new kernels and OS (except some boxes which just refuse to die ;-), and we’re usually way more ahead than all the bunch of conservative RHEL users.

We had one machine which was reporting CPU/northbridge/RAM problems, and every MySQL crash was accompanied by MCEs, so after replacing RAM, CPU and motherboard itself, we just sent the machine back to service, and asked them to do whatever it takes to fix it.

So, this machine, with proud name of ‘db1’ comes and after entering the service starts crashing every day. I reduced InnoDB log file size, to make recovery faster, and would run it under ‘gdb’. Stacktrace on crash pointed to check-summing (aka folding) bunch of functions, so initial assumption was ‘here we get memory errors again’. So, for a while I thought that ‘db1’ needs some more hardware work, and just left it as is, as we were waiting for new database hardware batch to deploy and there was a bit more work around.

We started deploying new database hardware, and it started crashing every few hours instead of every few days. Here again, reduced InnoDB transaction log size and gdb attached allowed to trap the segfault, and it was pointing again to the very same adaptive hash key calculation (folding!).

Unfortunately, it was non-trivial chain of inlined functions (InnoDB is full of these), so I built ‘-g -fno-inline’ build, and was keenly waiting for a crash to happen, so I could investigate what and where gets corrupted. It did not. Then I looked at our zoo just to find out we have lots of different builds. On one hand it was a bit messy, on another hand, it showed few conclusions:

Only Opterons crashed (though there’re like three year gap between revisions)
Only Ubuntu 8.04 crashed
Only GCC-4.2 build crashed

After thinking a bit that:

We have Opterons that don’t crash (older gcc builds)
Xeons didn’t crash.
We have Ubuntu 8.04 that don’t crash (they either are Xeons or run older gcc-4.1 builds)
We have GCC-4.2 builds that run nice (all – on Xeons, all on 8.04 Ubuntu).

The next test was taking gcc-4.1 builds and running them on our new machines. No crash for next two days.
One new machine did have gcc-4.2 build and didn’t crash for few days of replicate-only load, but once it got some parallel load, it crashed in next few hours.

I tried to chat about it on Freenode’s #gcc, and I got just:

noshadow>	domas: almost everything that fails when
		optimized (as inlining opens many new
		optimisation possibilities)
noshadow>	i.e: const misuse, relying on undefined
		behaviour, breaking aliasing rules, ...
domas>		interesting though, I hit it just with
		gcc 4.2.3 and opterons only
noshadow>	domas: that makes it more likely that
		it is caused by optimisation unveiling
		programming bugs

In the end I know, that there’s programming bug in ancient code using inlined functions, that causes memory corruption in multithreaded load if compiled with gcc-4.2 and ran on Opteron. As for now it is our fork, pretty much everyone will point at each other and won’t try to fix it :)

And me? I can always do:

env CC=gcc-4.1 CXX=g++-4.1 ./configure ...

I’m too lazy to learn how to disassemble and check compiled code differences, especially when every test takes few hours. I already destroyed my weekend with this :-) I’m just waiting for people to hit this with stock mysql – would be one of those things we love debugging ;-)

Knol

There isn’t much to talk about Knol technology – it is either nicely engineered or missing (they probably thought that search is main tool for collaboration). Of course, many issues are already covered by others, but…

My first look was at the featured articles. What was wrong?

It features ‘closed collaboration’. Actually, thats no different from a blog, then…
It doesn’t care much about the licensing – featured articles had images with “all rights reserved”, or images taken from Wikipedia, with attribution but without share-alike clause. Also, no share-alike license forbids importing of content from many other places, but as we see it – nobody cares. ;-)
It doesn’t care about linking. Google search was based on the web links. Wikipedia was built on top of lots of broken links (oh, and working ones too). And nobody is going to type a Knol URL.
It doesn’t seem to have community tools. It just doesn’t.
WYSIWYG editing leads to articles without structure, just some text parts bolder than the other.

So for now, it seems to be pure-engineering approach at the problem, without looking at actual work done, social implications or properly respecting copyrights.

One needs community for that. Community helps not only with content, but with style, metadata, organizing, and most of all – ensures that project maintains values and spirit.

Wikipedia at Velocity conference

Next Monday I’ll be presenting (if jetlag doesn’t kill me) at Velocity 2008 – webops and performance conference. It won’t be my first time talking about Wikipedia infrastructure, but this time people will know the technology and scaling methods anyway.

As I see it, in such context Wikipedia is more interesting as a case of operations underdog – non-profit lean budgets, brave approaches in infrastructure, conservative feature development, and lots of cheating and cheap tricks (caching! caching! caching!).

Also, I’ll be able to share (making audience jealous) how it is great to be on non-profit ops team (and one of example perks – we can be cheap about getting conference passes too ;-)

The best part (for audience, not for me) – I will be forced to be honest. Nearly whole tech team will be at the event, and if I fail to attribute any developments, or start talking crap – not only they can throw rotten tomatoes, but also disable my login access and claim they never knew me, without me being able to fight back :) I didn’t publicly present in front of these guys since 2005 – will be tough.

5.0 journal: various issues, replication prefetching, our branch

First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed ‘-g’ CFLAGS to my build, and that replaced default ‘-O2’. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.

Few migration notes – once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY – which pads data with bytes. Solution was converting everything into VARBINARY – as actually it doesn’t have much overhead. TRIM('' FROM field) eventually helped too.

The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately – though it had few more queries remaining in the queue – so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though…

I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to >300% instead of ~100% – so I felt nearly like I’ve done the good thing. Interesting though – profile showed that quite a lot of CPU time was spent in synchronization – mutexes and such – so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to ‘ALTER TABLE … ENABLE KEYS’ or something similar.

Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen’s prefetcher has disappeared from the internets, so the only one left was maatkit’s mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread – and speedup was less than 50%. I made a quick hack that parallelized the task, and it managed to double replication speed.

Still, there’re few problems with the concept – it preheats just one index, used for lookup, and doesn’t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless – as all buffer pool pages required would be already in memory – thus leading to less spots of mutex contention.

We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not – file a bug. For now, we have some workarounds, and don’t seem to be bitten too much by the behavior.

Anyway, in the end I directed half of this site’s core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we’re just doing something wrong.

I’ve published our MySQL branch at launchpad. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)

I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there’s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)

Board

Exciting times, I’m joining the Wikimedia Foundation Board of Trustees. That means lots of work on what is strong community organization, supporting the modern day wonders.

Technology report, 2007

Some of Wikipedia technology state summed up in annual report.

IE finds JS in Images (old xss bug!)

Well, this fix was done more than three years ago, but this is one of most evil IE bugs in existence. Even better, it seems to have never been fixed, exists in IE7, and is being discussed in various places lately.

The problem is very simple – valid PNG files can be uploaded to various sites, and then shown to users. The problem is that IE does autodetection, and if it suspects that the file may be HTML, it executes it as HTML, with all Javascript inside. The images can be properly normal images, that show your kitten or wife or whatever. Still, IE will execute any exploit code that is included in them. Exploit code can actually load the actual image, so nobody will even realize they’re looking at image and not at an attack that hijacks their sessions, steals cookies and does all other sorts of evil things.

So, whenever anyone says IE is secure, just tell them to look at this problem.

	markcallaghan (@mark… on MySQL does not need SQL
	markcallaghan (@mark… on MySQL does not need SQL
	Domas Mituzas on MySQL does not need SQL
	Marc on MySQL does not need SQL
	Nils Meyer on linux memory management for…