A case for FORCE INDEX

I remember various discussions in different mediums where people were building cases against use of FORCE INDEX in SQL queries. I’ll hereby suggest it using way more often, but at first I’ll start with small explanation.

For ages, the concept of index statistics affecting query plans has been clogging minds of DBAs, supported by long explanations of MyISAM and InnoDB manuals. Actually, statistics are used just for determining which index to use for a joined table, as predicate is not known at the time of ‘optimization’.

What happens if you do a simple query like:

SELECT * FROM table WHERE a=5 AND b=6

? If there’s an index that enforces uniqueness on (a,b), it will be used – this is short-path for PRIMARY KEY lookups. Otherwise, it will go to any index, composite or not, that can satisfy either a or b (or both), and evaluate how many rows it will fetch from it using the provided criteria.

Now, contrary to what people usually think, the row count evaluation has nothing really much to do with cardinality statistics – instead it builds the range that the known predicate can check on existing index, and does two full B-Tree dives to the index – one at the start of the range, and one at the end of it. For each possible index.
This simply means that even if you are not using the index to execute query, two leaf pages (and all the tree branches to reach them) will end up being fetched from disk into the cache – wasting both I/O cycles and memory.

There’s also quite interesting paradox at this – in some cases, more similar other indexes are, more waste they create because of rows-in-range checks. If a table has indexes on (a,b,c) and (a,b,d), query for (a,b,d) will be best satisfied by (a,b,d) index, but will evaluate range sizes for (a,b). If the first index were (a,c,b), it would be only able to check head and tail of (a) – so way less B-Tree positions would be cached in memory for the check. This makes better indexing sometimes fare worse than what they’re worth in benchmarks (assuming that people do I/O-heavy benchmarking :)

The easy way out is using FORCE INDEX. It will not do the index evaluation – and no B-Tree dives on unneeded index.

In my edge case testing with real data and skewed access pattern hitting a second index during ‘statistics’ phase has increased execution time by 70%, number of I/Os done by 75%, number of entrances into buffer pool by 31% and bloated buffer pool with data I didn’t need for read workload.

For some queries like “newest 10 entries” this will actually waste some space preheating blocks from the other end of the range that will never be shown – there will definitely be a B-Tree leaf page in buffer pool with edits from few years ago because of RIR. Unfortunately, the only MySQL-side solution for this is HANDLER interface (or probably HandlerSocket) – but it doesn’t make using FORCE INDEX not worth it – it just pushes towards making FORCE INDEX be much more forceful.

So, use the FORCE, Luke :)