Solaris has this beautiful tool ‘plockstat’ that can report application lock contention and hold events (hold events turn into contention events at parallelism.. ;-) It is just a frontend to a set of dtrace rules, that monitor mutexes and rwlocks.
Today I was testing an edge case (what happens, when multiple threads are scanning lots of same data) – and plockstat/dtrace indicated that there were zero (0!!!) lock waits. I tried using ‘prstat’ with microstate accounting, and it indeed pointed out that there’s lots of LCK% activity going on (half of CPU usage…). The dtrace profiling oneliner (dtrace -n 'profile-997 {@a[ustack()]=count()}'
) immediately revealed the culprit:
libc.so.1`clear_lockbyte+0x10 libc.so.1`mutex_unlock+0x16a mysqld`mutex_exit+0x1d mysqld`buf_page_optimistic_get_func+0xa0
So, plenty of CPU time was spent when trying to unlock mutex (what seemed strange), but didn’t seem that strange once I noticed the code:
do { old = *lockword64; new = old & ~LOCKMASK64; } while (atomic_cas_64(lockword64, old, new) != old);
So, there’s unaccounted busy loop (it is just part of hold event in dtrace). What is odd, is that nobody expects this place to loop too much, what happens here – it gives away mutex to other thread, which wants it. So, instead of having the new owner spin-lock (where it accounts properly), it has old owner spin-locking. I’m not convinced this kind of behavior is one that should scale on large machines, but I’m not much of a locking expert.
Without proper instrumentation plockstat failed to provide information about locks that were consuming half of CPU time. I hope that really was just an edge case – more real testing will follow soon, will see if plockstat will fail as much. Oh well, will find the information I need anyway :) Lesson learned – treat pretty much everything with grain of salt, especially when OS tells you mysql has no lock contention, haha.
Please file a bug on this at bugs.opensolaris.org, thanks.
Ugh… sorry about that. As the author of plockstat, I’m embarrassed by that oversight. I’ll try to get that resolved.
Adam, 1:1 for the embarrassment I had in CEC :)