I’ve experimented a bit with wikipedia’s existing profiling framework lately, and extracted some of bits from that. The way we do it now is having our internal profiler written in PHP, and wfProfileIn() / wfProfileOut() function calls around the code. As there’re quite a lot of profiling entry places, overhead isn’t tolerable at high loads, so it’s turned on only for every 50th page view.
One of interesting entry points is in Database::query(), we have a tiny set of regexps in Database::generalizeSQL, that convert literal arguments into ‘X’ for string and ‘N’ for integer. This way we end up with all similar queries grouped together in profiling report, showing us which queries need some love or bring down our cluster.
Profiler may have different personalities, as result can be written to database table, printed out to a user, or sent over a network packed into UDP packets. I made a special daemon for handling those messages, which is just in-memory hashtable, that is updated on every incoming UDP message and any incoming TCP connection gets full XML export. Later this can be served in human output by reporting script. It is very easy to scale such profiling system, as state preservation is not needed, and XMLs can be aggregated. I guess it was possible to do that with HEAP tables in MySQL, but writing tiny daemons is sooo fun :-)
One of plans now is to rewrite profiler class into C (that would provide ability to run large-scale all-request profiling again), and merge that with our setproctitle extension, which currently allows to see what part of PHP code is being executed:
httpd: wfMsgGetKey [enwiki] httpd: main-cleanup [dewiki] httpd: requestfinish httpd: query: SELECT page_id FROM `page` WHERE page_namespace = 'X' LIMIT N [enwiki] httpd: query: SELECT blob_text FROM `blobs` WHERE blob_id = 'X' LIMIT N [enwiki]
Anyway, do not think of in-code profiling as of a replacement for in-core profilers such as APD or xdebug, as these can tell much more accurately where you’re fast and where you’re slow (sometimes these results may surprise!). But what you can win with code-controlled profiler – a general view for deployed application in distributed environments, that relies not on call-tree, but rather your own defined profiling blocks.
And now we know that :
- 20-25% of our MediaWiki execution real time is spent waiting for MySQL (3.5ms per query)
- 16% for memcached (or actually, Tugela, my memcached+BerkeleyDB hack :)
- 20% of requests to backend are searches, and we spend 70ms average waiting for Lucene and Mono based mwdaemon to respond
- Saving of an article takes around 0.8s, page view is around 0.1s
- Average times (and deviations, if someone hacks reporting :) for every query…
In most cases developers without tools do miss real performance hotspots. This is what tools are for! :-)