MySQL metrics for read workloads

There are multiple metrics that are really useful for read workload analysis, that should all be tracked and looked at in performance-critical environments.

The most commonly used is of course Questions (or ‘Queries’, ‘COM_Select’) – this is probably primary finger-pointing metric that can be used in communication with different departments (“why did your qps go up by 30%?”) – it doesn’t always reveal actual cost, it can be increase of actual request rates, it can be new feature, it can be fat fingers error somewhere in the code or improperly handled cache failure.

Another important to note is Connections – MySQL’s costly bottleneck. Though most of users won’t be approaching ~10k/s area – at that point connection pooling starts actually making sense – it is worth to check for other reasons, such as “maybe we connect when we shouldn’t”, or needlessly reconnect, or actually should start looking more at thread cache performance or pooling options. There’re some neighboring metrics like ‘Bytes_sent’ – make sure you don’t hit 120MB/s on a gigabit network :-)

Other metrics usually are way more about what actually gets done. Major query efficiency signal for me for a long time used to be Innodb_rows_read. It is immediately pointing out when there are queries which don’t use indexes properly or are reading too much data. Gets a bit confusing if logical backup is running, but backup windows aside, this metric is probably one that is easy enough to track and understand. It has been extremely helpful to detect query plans gone wrong too – quite a few interesting edge cases could be resolved with FORCE INDEX (thats a topic for another post already :-)

For I/O heavy environments there’re few metrics that show mostly the same – Innodb_buffer_pool_reads, Innodb_data_reads, Innodb_pages_read – they all show how much your requests hit underlying storage – and higher increases ask for better data locality, more in-memory efficiency (smaller object sizes!) or simply more RAM/IO capacity.

For a long time lots of my metrics-oriented performance optimization could be summed up in this very simple ruleset:

  • Number of rows shown to user in the UI has to be as close as possible to rows read from the index/table
  • Number of physical I/Os done to serve rows has to be as close to 0 as possible :-)

Something I like to look at is the I/O queue size (both via iostat and from InnoDB’s point of view) – Innodb_data_pending_reads can tell how loaded your underlying storage is – on rotating media you can allow multiples of your disk count, on flash it can already mean something is odd. Do note, innodb_thread_concurrency can be a limiting factor here.

Overloads can be also detected from Threads_running – which is easy enough to track and extremely important quality of service data.

An interesting metric, that lately became more and more important for me is Innodb_buffer_pool_read_requests. Though it is often to use buffer pool efficiency in the ratio with ‘buffer pool reads’, it is actually much more interesting if compared against ‘Innodb_rows_read’. While Innodb_rows_read and Handler* metrics essentially show what has been delivered by InnoDB to upper SQL layer, there are certain expensive operations that are not accounted for, like index estimations.

Though tracking this activity helps I/O quite a bit (right FORCE INDEX reduces the amount of data that has to be cached in memory), there can be also various edge cases that will heavily hit CPU itself. A rough example could be:

SELECT * FROM table WHERE parent_id=X and type IN (1,2,4,6,8,…,20) LIMIT 10;

If there was an index on (parent_id,type) this query would look efficient, but would actually do range estimations for each type in the query, even if they would not be fetched anymore. It gets worse if there’s separate (type) index – each time query would be executed, records-in-rage estimation would be done for each type in IN() list – and usually discarded, as going after id/type lookup is much more efficient.

By looking at Innodb_buffer_pool_read_requests we could identify optimizer inefficiency cases like this – and FORCE INDEX made certain queries 30x faster, even if we forced exactly same indexes. Unfortunately, there is no per-session or per-query metric that would do same – it could be extremely useful in sample based profiling analysis.

Innodb_buffer_pool_read_requests:Innodb_rows_read ratio can vary due to multiple reasons – adaptive hash efficiency, deeper B-Trees because of wide keys (each tree node access will count in), etc – so there’s no constant baseline everyone should adjust to.

I deliberately left out query cache (here’s the reason), or adaptive hash (I don’t fully understand performance implications there :). In mysql@facebook builds we have some additional extremely useful instrumentation – wall clock seconds per various server operation types – execution, I/O, parsing, optimization, etc.

Of course, some people may point out that I’m writing here from a stone age, and that nowadays performance schema should be used. Maybe there will be more accurate ways to dissect workload costs, but nowadays one can spend few minutes looking at metrics mentioned above and have a decent understanding what the system is or should be doing.

About these ads
This entry was posted in facebook, mysql and tagged , , , , . Bookmark the permalink.