On binaries and -fomit-frame-pointer

Over last few years 64-bit x86 platform has became ubiquitous, thus making stupid memory limitations a thing of some forgotten past. 64-bit processors made internal memory structures bigger, but compensated that with twice the amount and twice larger registers.

But there’s one thing that definitely got worse – gcc, the compiler, has a change in default compilation options – it omits frame pointers from binaries in x86_64 architecture. People have advocated against that back in 1997 because of very simple reasons, that are still very much existing today too – frame pointers are needed for efficient stack trace calculations, and stack traces are very very useful, sometimes.

So, this change means that oprofile is not able to give hierarchical profiles, it also means that MySQL crash information will not include most useful bit – the failing stack trace. This decision has been just because of performance reasons – frame pointer takes whole register (though on x86_32 that meant 1 out of 8, on x86_64 it is 1 out of 16), which could be used to optimize the application.

I tested two MySQL builds, one built with ‘-O3 -g -fno-omit-frame-pointer’ and other with -fomit-frame-pointer instead – and performance difference was negligible. It was around 1% in sysbench tests, and slightly over 3% at tight-loop select benchmark(100000000,(select asin(5+5)+sin(5+5))); on a 2-cpu Opteron box.

The summary suggestion for this flag would be very simple. If you don’t care about fixing or making your product faster, you can probably use 1% speed-up. But if getting actual real-time performance data can lead to much better performance fixes, and fast introspection means qualified engineers can diagnose problems much faster, 1% or even 3% is not that much. So, add ‘-fno-omit-frame-pointer’ to CFLAGS and CXXFLAGS, and enjoy things getting back to normal :)

By the way, while I was at it, I did some empiric tests of other options, but one that irks me most is not using ‘-g’ on production binaries. See, debugging information, symbol tables, etc – they all cause around 0% performance difference. The only difference is that e.g. mysqld binary will weight 30M, instead of 6M (though that fat will not be loaded into RAM, and will only cost diskspace).

Why does debugging information matter? It doesn’t, if you don’t attempt to be power-user. It does, if you enjoy crazy debugger tricks (like one here or here). Oh, and of course, bonus GDB trick, how to run KILL without connecting to server:

(gdb) thread apply all bt
...
(gdb) thread 2
[Switching to thread 2 (Thread 0x44a76950 (LWP 23955))]#0  ...
(gdb) bt
#0  0x00007f7821e68e1d in pthread_cond_timedwait...
#1  0x000000000052dc0e in Item_func_sleep::val_int (this=0x12317f0)...
#2  0x0000000000501484 in Item::send (this=0x12317f0, ...
...
#15 0x00000000005caf29 in do_command (thd=0x11da290) ...
...
(gdb) frame 15
#15 0x00000000005caf29 in do_command (thd=0x11da290) at sql_parse.cc:854
854	  return_value= dispatch_command(command, thd, packet+1, ...
(gdb) set thd->killed = THD::KILL_QUERY
(gdb) continue

And the client gets ‘ERROR 1317 (70100): Query execution was interrupted’ :-)