5.0 journal: various issues, replication prefetching, our branch

First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed ‘-g’ CFLAGS to my build, and that replaced default ‘-O2’. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.

Few migration notes – once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY – which pads data with bytes. Solution was converting everything into VARBINARY – as actually it doesn’t have much overhead. TRIM('' FROM field) eventually helped too.

The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately – though it had few more queries remaining in the queue – so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though…

I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to >300% instead of ~100% – so I felt nearly like I’ve done the good thing. Interesting though – profile showed that quite a lot of CPU time was spent in synchronization – mutexes and such – so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to ‘ALTER TABLE … ENABLE KEYS’ or something similar.

Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen’s prefetcher has disappeared from the internets, so the only one left was maatkit’s mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread – and speedup was less than 50%. I made a quick hack that parallelized the task, and it managed to double replication speed.

Still, there’re few problems with the concept – it preheats just one index, used for lookup, and doesn’t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless – as all buffer pool pages required would be already in memory – thus leading to less spots of mutex contention.

We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not – file a bug. For now, we have some workarounds, and don’t seem to be bitten too much by the behavior.

Anyway, in the end I directed half of this site’s core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we’re just doing something wrong.

I’ve published our MySQL branch at launchpad. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)

I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there’s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)

Google does encyclopedia: Knol

It is all closed yet, announcement by VP Engineering tells us Google is launching their idea of encyclopedia – looking for people who can write authoritative articles. No words on licensing apart from “we want to disseminate it as widely as possible”, though author-centric view is more what Citizendium, than Wikipedia wants to do.

Ad revenue sharing poses many interesting questions, especially in collaborative effort. As Wikipedia now provides page view statistics, Google (or knollers) may just work on top of the cream pages (by knowing search trends), and have very distorted overall content. Now it is closed, invite only, so we can’t tell anything more. Time will show. Good to know more organizations are believing about aggregating and disseminating knowledge – it is Wikipedia’s mission, and it is nice to have partners. :-) Though of course, there might be some tensions with Search Quality team…

Weird wit by Google translation technology

I was translating some document from German to English, that had my surname in it.
It got translated to ‘Beesley’, and I immediately thought of Angela Beesley, chair of Wikimedia Advisory Board. I started playing more, and did find, that:

  • French ‘Domas Mituzas’ to English translates as ‘Anthere fall’
  • ‘Mituzas’ in German is ‘Schindler’ (Matthias?:)
  • Spanish ‘Domas Mituzas’ to English translates as ‘Anthere Anthere’ (every wikipedian has a bit of Florence inside :)
  • English to Portuguese renders me as “Domas Lessig” (I have creative commons t-shirt :)
  • English to Chinese is “florence 100,000″…

Thats what Web 3.0 is all about. Tampering with my personality. Who am I? :)

Citizendium revisited

Just spotted amazing article, how Citizendium built better infrastructure than Wikipedia’s. There lots of fascinating details there, like…

They went with PostgreSQL for a number of reasons, including better scalability. PostgreSQL is an MVCC database. Unlike Wikipedia, Citizendium never has to lock the database for reads and writes. MySQL can do a lot of things quick and replicate them to slave servers, but PostgreSQL excels at complex functions and full features like JOINs and can do complicated categories and full text searches faster than Wikipedia.

If PG can function without locks, it must be definitely more scalable. InnoDB uses mutexes, spinlocks, etc – and that internal locking can be a bottleneck in many cases. Additionally, if a row is updated, a lock on the record is acquired. It is still a question how PG maintains ACID without any locks, got to research on that more.
I’m aware that MySQL isn’t best at full-text search out there – but Wikipedia uses Lucene for full-text search, so it is somewhat strange to hear that Citizendium platform is faster in that regard. And… I’m not sure where JOIN performance is really faster there – especially when we do lots of covering-index based joins. Probably the key word there is ‘complex’, though I’m not sure what that means :-)
The first reason not to use MySQL was:

First, to be different from Wikipedia.

Indeed, I always support critical thinking! Though this one:

Finally, we felt from reading various mailing lists over mediawiki development that mediawiki was hitting the ceiling of the features MySQL can provide as a backend.

IIRC that came from single post on single mailing list from someone who is not running Wikipedia backend. Mhm.
Of course, their monthly traffic is equal to our single minute traffic, so some views might differ…