Blowing up in memory

MySQL isn’t too concerned about table handler memory usage – it will allocate row size buffer thrice per each table invocation. There’s a few year old bug discussing UNION memory usage – for each mention in an union one can allocate nearly 200k of unaccounted memory – so a megabyte sized query can consume 7GB of RAM already.

Partitioning though adds even more pain here – it will allocate those three buffers per each partition, so opening a table with 1000 partitions looks like this on memory profile:

Click to enlarge, and you will see 191MB sent to execute a simple single-row fetching query from a table (I filed a bug on this).

There’re multiple real life situations when this is painful (e.g. any kind of server stall may lead to multiple concurrent threads reading from same table, consuming additional gigabytes or tens of gigabytes of memory). It gets even more painful when combined with UNION bug – a megabyte query on an empty table can now consume 7TB of memory and I doubt anyone has that much on their MySQL servers :-)

P.S. Also, check out how much memory can be wasted for malloc overhead, once discussed here.
P.P.S. And here you can see why innodb_max_dirty_pages_pct=0 doesn’t do what you’d expect.

Memcached for small objects

Memcached quite often ends up as a store for very small objects (small key and some integer value), though it isn’t really designed to do this kind of work by default. Current memory management is based on slabs (200 of them), where objects are grouped by similar size – though actual sizes are pre-defined at startup based on few configuration parameters.

By default memcached would have slabs based on assumption, that smallest object size will have 48 bytes of data (thats without item header), and will increase the slab sizes in +25% steps:

slab class   1: chunk size    104 perslab 10082
slab class   2: chunk size    136 perslab  7710
slab class   3: chunk size    176 perslab  5957
slab class   4: chunk size    224 perslab  4681
...

So, in this case, it allocates at least 104 bytes per object, and next steps are way behind. Fortunately, there’re some quick steps to have better efficiency: Continue reading “Memcached for small objects”

Wasting InnoDB memory

I usually get strange looks when I complain about memory handling inside InnoDB. It seems as if terabytes of RAM are so common and cheap, that nobody should really care about memory efficiency. Unfortunately for me, I do.

Examples:

  • The infamous Bug#15815 – buffer pool mutex contention. The patch for the bug added lots of small mutexes, and by ‘lots’ I mean really really lots – two mutexes (and rwlock structure) for each buffer pool page. That makes two million mutexes for 16GB buffer pool, um, four million mutexes for 32GB buffer pool, and I guess more for larger buffer pools. Result – 16GB buffer pool gets 625MB locking tax to solve a 8-core locking problem. Solution? Between giant lock and armies of page mutexes there lives a land of mutex pools, where locks are shared happily by multiple entities. I even made a patch, unfortunately it gets some ibuf assertion after server restart though at first everything works great :)
  • InnoDB data dictionary always grows, never shrinks. It is not considered a bug, as it isn’t memory leak – all memory is accounted by (hidden) dict_sys->size, and valgrind doesn’t print errors. 1-column table takes 2k of memory in InnoDB data dictionary, a table with few more columns and indexes takes already 10k. 100000 tables, and 1GB of memory is wasted. Who needs 100000 tables? People running application farms do. Actually, there even is a code for cleaning up data dictionary, just wasn’t finished, and is commented out at the moment. Even worse, the fix for #20877 was a joke – reducing the in-memory structure size, still not caring about structure count. And of course, do note that every InnoDB partition of a table takes space there too…

So generally if you’re running bigger InnoDB deployment, you may be hitting various hidden memory taxes – in hundreds of megabytes, or gigabytes – that don’t provide too much value anyway. Well, memory is cheap, our next database boxes will be 32GB-class instead of those ‘amnesia’ 16GB types, and I can probably stop ranting :)

LAMPS on steroids

I’m not sure if I’m the first coining in ‘LAMPS’ – scaled out LAMP environment with Squid in front, but it sounds cool. Squid is major component in content distribution systems, reducing the load from all the backend systems dramatically (especially with proper caching rules). We had various issues in past, where we used code nobody else seemed to be using – cache coordination, purges and of course, load.

Quite a few problems resulted in memory leaks, but one was particularly nasty: Squid processes under high load started leaking CPU cycles somewhere. After deploying profiling for squid we actually ended up seeing that the problem is inside libc. Once we started profiling libc, one of initial assumptions appeared to be true – our heap was awfully fragmented, slowing down malloc().

Here comes our steroids part: Google has developed a drop-in malloc replacement, tcmalloc, that is really efficient. Space efficient, cpu efficient, lock efficient. This is probably mostly used (and sophisticated) libc function, that was suffering performance issues not that many people wanted to actually tackle. The description sounded really nice, so we ended up using it for our suffering Squids.

The results were what we expected – awesome :) Now the nice part is that the library is optimized for multi-threaded applications, doing lots of allocations for small objects without too much of lock contention, and uses spinlocks for large allocations. MySQL exactly fits the definition, so just by using simple drop-in replacement you may achieve increased performance over standard libc implementations.

For any developers working on high-performance applications, Google performance tools provide easy ways to access information that was PITA to work on before. Another interesting toy they have is embedded http server providing run-time profiling info. I’m already wondering if we’d should combine that with our profiling framework. Yummy. Steroids.