Lately we have been especially enjoying the opportunities that Poor Man’s Profiler provides us – but also the technology has improved a lot too – there have been few really useful mutations.
One mutation (hyper-pmp) was Ryan Mack’s approach of having somewhat more efficient sampling – instead of firing gdb each time, he instructed gdb to get backtraces every time monitored process gets a signal (SIGUSR2 for example). This allows to maintain a persistent debugger attachment – and then signal periodically to get stacks analyzed.
Other mutation was auto-pmp – high frequency polling of process state (e.g. how many threads are running), and when a certain threshold is exceeded – obtaining stacks for further analysis (this combines really well with the hpmp approach – one process is the stacks reader, and other is signaling on thresholds). My major problem in such approach was that the polling methods we chose would be biased to show me end of overload events (because it wouldn’t return process state due to internal process locking).
At one point in time I had an epiphany, that was quickly melted by the reality – in theory we could use gdb watchpoints to replace my external process polling. Watchpoints allow to break a process when a change to a variable inside a program happens (and conditions can be applied), so essentially we would be able to instrument gdb to get stacks exactly at the moment when there’re stalls and spikes. Unfortunately, even though that worked fine in single-threaded or lightly loaded environments, monitored process crashed horribly in more realistic workloads – we have yet to figure out if it is a fundamental issue of the approach or actually a bug that may have been fixed in later versions.
Of course, there’s a workaround, that we’re considering for high performance system analysis – simply instrumenting a process to fire a signal or do a conditional jump whenever there’s an overload condition – so essentially that would be implementing in-process watchpoint-to-breakpoint translation giving us just-in-time analytics – so we’d see pretty much every situation where running threads pile up (unless there’s a bottleneck that simply doesn’t allow the workload to arrive :)
PMP on-demand allowed us to uncover various issues inside MySQL that have been overlooked in most of benchmarking as non-significant, but they are critical for us in providing better quality of service for each query, not just 99th percentile (I wrote about that recently). We keep thinking how to provide instrumentation for some of views we get inside MySQL (e.g. an ability to export pthread lock graph without using external tools), as well as better visibility of I/O blocking…
But for now we have what we have, poor man’s profiler :-)