INSERT speed, paramy, auto-inc

One of first things I noticed, when I tried to run some 5.0 tests, was that INSERT speed was much slower (no benchmarking at that time, purely subjective speed that loading the dump was somewhat too slow).

After discussing at evening, and some coding in the morning I built paramy – a simple C++ utility, that reads .sql stream, and spreads execution of statements over multiple worker threads.

For benchmarking I took a simple 13m row, 1.6G data, 1.7G index table. Loading it with standard mysql client took 30min (7200r/s). Paramy unfortunately hit auto-inc lock, and didn’t provide too much of performance – data load took 25min (8700r/s). The win here was mostly because everything until InnoDB being parallel, as instead of 100% core use, mysqld went up to ~130%, so it obviously managed to get additional cycles from SMP.

Dropping AUTO-INC from the field description changed everything. Standard mysql load wasn’t much faster, but under paramy load mysqld managed to use ~320% of CPU core, and did the load in 10min (~21500r/s).

The interesting part – no INSERT asks for AUTO-INC values, so in theory one can just disable the locking during the load, and re-enable it same way as unique or foreign key constraint checks are re-enabled after import of data – that way loading of data into InnoDB would be much much faster, especially with many more cores – though then contention moves away to transaction log mutex.

Additional way to reduce contention inside InnoDB may be removing both reading and writing checksums- they have quite big CPU load share. Disabling this at least during bulk imports can be quite useful.

Oh, I mentioned the subjective feeling that 5.0 was slower. I finally benchmarked – 4.0 did the data load in five minutes, and went over 40000 rows/s. I hope this is the last test where 4.0 is twice faster than 5.0.

Shameless ad

The Sun Fire X4240, powered by the AMD Opteron 2200 and 2300 processor series, is a two-socket, 8-core, 2RU system with up to twice the memory and storage capacity of any system in its class. It’s the first and only two-socket AMD Opteron system with sixteen hard drive slots in a 2RU form factor.”

Well, now that I work for Sun, it ends up being a shameless ad and boasting :) But back when I saw information about this product, I wasn’t my first thought was “wow, thats the best machine for scaling up scaled out environments!”.

In web database world people agree that number of spindles (disks!) matters – remember YouTube’s “think disks, not servers” mantra said during the scaling panel at MySQL conference. Before, getting such number of spindles would’ve required external arrays taking space and sucking power (TCO! ;-)

And for us… it probably means we can finally start doing RAID10, instead of RAID0. :-)

By the way, that box even has Quad-Core service processor. Way to go! :)

Trainwreck: external MySQL replication agent

I wanted to work more on the actual project before writing about it, but I’m lazy, and dear community may be not.

At Wikimedia we have one database server which replicates from multiple (like 15!) masters. It even splits replication streams by database, and applying changes in parallel.

All this stuff is done by external replication agent, Trainwreck. It is public-domain software, which was written by River, doesn’t have much documentation, works only on Solaris (River likes Solaris), unless you comment out all process management blocks, which use doors and other Solaris specific API.

It lives in Wikimedia SVN, and can be checked out using:

svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/trainwreck/

It sits there, maintained just for needs of that specific single server (ok, there might be two or three), so if anyone wants to make it available for broader audience, feel free to fork a project to some community-oriented place, add all nice features you need. :)

On valgrind and tcmalloc

I already wrote about tcmalloc, and how it helped with memory fragmentation. This time had some experience with extended features – memory profiling and leak checker. With tcmalloc it is possible to get an overview as well as detailed reports of what areas of memory are allocated for what uses. Even more, it can detect and report any memory that leaked. Of course, valgrind does that too. With one slight difference:

  • valgrind slows down applications 20-40 times (my tests ran 4s instead of 14ms)
  • tcmalloc does not. Same 14ms.

I wrote some MySQL UDFs for profiling and heap checking management, so can extract per-thread single-test stuff. Will try to clean up and release. Would be shame not to.

First days at Sun, Honeycomb opensourced

So now that MySQL is part of Sun, I can be real Sun technologies fanboy :)
So besides all the ‘it is done’ news, I noticed on Sun’s site that they’ve (oh, we’ve) opensourced Honeycomb. Thats the technology I’ve been drooling about quite a few years ago, and would’ve always wanted to see open. Now it is.
Honeycomb (aka StorageTek 5800) is distributed content-addressable object storage system (CAS), that allows putting in objects, and their metadata, have it indexed, and represent virtual directories based on that metadata. Accessible via WebDAV, API, CLI, etc… Previous solutions used to be expensive, now put it on top on Thumpers, and it is easy to manage storage archive. Now all I need is to find if it really is suitable for our needs, and start playing more with it.
Oh, and MySQL storage engine access to it would be quite cool too, it would mean Brian’s archiving engine is history :)

Update: Damn, apparently only emulators and clients released so far, and according to Sun folks, the HADB bits might be complicated to release – though the other stack might appear public and open soonish.