MySQL does not need SQL

The “DBMS” part of MySQL is fine – storage engines (especially with new kids on the block), replication, etc, but it really sucks at executing SQL.

I won’t even get into complex SQL that has complex data dependencies, will start with very basic tasks.

SELECT * FROM table WHERE indexed = "A" LIMIT 1

If multiple indexes can satisfy the query, MySQL will hit each of them at least twice – looking up first “A” and last “A” records. It will do that while planning the SQL execution. Once it comes up with a plan, it will go and hit the index it picked again, this time once. So, if you have two candidate indexes, you will have 5 index accesses at 4 positions.

How would a direct cursor access work? Single index hit. Want to simulate that in SQL? You can add a ‘FORCE INDEX’ hint, then only one index will be considered, but still at two positions and you will have three index accesses. I wrote about this before.

This query:

SELECT * FROM table WHERE indexed IN (101, 201, 301) LIMIT 1

Would do six index dives during preparation as well (two for each IN position), plus another one during execution – seven dives when one would suffice.

What we learn from this is that whenever there’s an opportunity for lazy evaluation, MySQL will gladly miss it and do most work possible to run your query.

If you could express it yourself, you’d be lazy and you’d be efficient. It gets even worse if we start expecting something more advanced than a basic cursor positioning.

Whenever you’re doing a range scan on (a, b) indexed table:

SELECT * FROM table WHERE a BETWEEN 1 and 100 AND b=1234567 LIMIT 1

There’s an easy optimization given low-cardinality ‘a’ – you jump to each ‘a’ position and then you can do a dive for the ‘b’ value. In a way you can emulate this behavior with:

SELECT * FROM table
JOIN ( SELECT DISTINCT a FROM table ) x
USING (a) WHERE b=1234567 LIMIT 1

As I mentioned, MySQL will skip any opportunity to be lazy and in this case it will fully materialize all distinct ‘a’ values. If it were able to lazy evaluate, there’s a chance we can terminate the scan early.

We’re adding ability to skip-scan records in our builds (you can follow the work at https://reviews.facebook.net/D59877).

It is quite easy to describe whatever is needed in basic loop though – storage engines already provide with necessary hooks for these types of access methods.

Another basic single-table SELECT that fails with SQL is a standard “feed” query:

SELECT * FROM table WHERE category IN (5,6)
ORDER BY time DESC LIMIT 5;

MySQL will have to pick between two alternatives, one would be just scanning ‘time’ index and searching for specified categories among found rows, another is read all entries for each category, sort them all and return top 5.

Efficient way to execute such query would involve having per-category cursors and merging their reads, something like:

( SELECT * FROM table WHERE category = 5 ORDER BY time DESC LIMIT 5 )
UNION ALL
( SELECT * FROM table WHERE category = 6 ORDER BY time DESC LIMIT 5 )
ORDER BY time DESC LIMIT 5

MySQL is unable to merge these two cursors without writing all the data into temporary file and sorting it – although we already are reading much smaller datasets than with alternative naive (and readable) query. Besides, each subselect will open a table handler with all the associated buffers and if you want to merge hundred cursors you’re looking at hundred open tables.

You can do this operation efficiently in bash (with ‘sort -m’), it isn’t that complicated algorithm in any scripting language, but having such access method in MySQL’s SQL doesn’t seem likely.

Even where MySQL is already doing efficient loose scan (like in indexed-based GROUP BY), there were basic bugs open for years where efficiency would go down the drain simply because a LIMIT is added.

There’re quite a few other limitations in access methods that are quite annoying to work around. For example a query like this one:

SELECT * FROM table ORDER BY priority DESC, time ASC

Would be unable to use a regular index on (priority, time) – it can only walk everything in one direction and cannot have mixed order scans that are trivial to implement in basic procedural algorithm (move cursor to lowest time of lower priority, read all records for that priority in ascending order before repositioning the cursor to even lower priority).

Of course, one can change the direction of an index, or even have multiple indexes on same columns just to get an ordered read efficient. But none of that is needed if there’s a proper access method that can be used by SQL execution.

Directional index hints may be needed just to specify common order (e.g. prefix compression in RocksDB makes one direction cheaper than the other, although both still much cheaper than full blown filesort).

Even basic single order traversal breaks if you’re joining two tables:

SELECT * FROM t1 JOIN t2 USING (b) ORDER BY t1.a, t1.b, t2.c

In some cases I (*shudder*) had to use ‘FORCE INDEX’ without an ORDER BY to pick a direction I needed. One could expect basic functionality like ordered JOIN traversal to be part of SQL package.

There’re various other issues related to JOINs – again, in the best tradition of MySQL, lazy evaluation of joins is not possible – it will gladly do more work than you’d ever need, reading from tables you did not ask for.

These things slowly slowly change with each version, and to get basic things right we have to yell a lot. Hurray, 5.7 may finally have fixes for issues we were working around back when 5.x series did not exist) – but that is just way too slow for an SQL layer to evolve.

MySQL employs many bright engineers many of whom are trying to make better cost decisions on same existing limited set of access methods they use, by spending more cycles and adding more and more overhead.

In order to execute various queries one has to either resort to network roundtrips after each row or over-fetch data all the time. You may suggest stored procedures, but their data access is limited to running SQL queries and returning them as multiple result sets (or buffer everything in temporary table). Amount of deferred work and lazy evaluation you can do in a stored procedure is quite limited and overhead of running many SQL statements within a procedure is high.

Ability to access data via lower level interfaces by high performance server-side language (Lua? JS?) would allow to circumvent many many many limitations and inefficiencies of existing SQL optimizer and execution layer while still allowing occasional SQL access, high performance storage engines, replication and all the tooling that has been built to support MySQL at scale.

What do we do today? We run lots of that conditional logic in our application and absorb lots of the cost with a cache layer in the middle. There’s a chance that we would be able to spend less CPU, less I/O, less memory and less network resources if we could ask smarter queries expressed as procedures instead of wrangling all the relational algebra on top of dumb executor.

In some cases we have different database systems.

P.S. Many of the hypothetical scenarios map to workloads where amounts of data and query volume warrants all the optimizations I discussed here.

P.P.S. Other databases may have other set of issues.

A case for FORCE INDEX

I remember various discussions in different mediums where people were building cases against use of FORCE INDEX in SQL queries. I’ll hereby suggest it using way more often, but at first I’ll start with small explanation.

For ages, the concept of index statistics affecting query plans has been clogging minds of DBAs, supported by long explanations of MyISAM and InnoDB manuals. Actually, statistics are used just for determining which index to use for a joined table, as predicate is not known at the time of ‘optimization’.

What happens if you do a simple query like:

SELECT * FROM table WHERE a=5 AND b=6

? If there’s an index that enforces uniqueness on (a,b), it will be used – this is short-path for PRIMARY KEY lookups. Otherwise, it will go to any index, composite or not, that can satisfy either a or b (or both), and evaluate how many rows it will fetch from it using the provided criteria.

Now, contrary to what people usually think, the row count evaluation has nothing really much to do with cardinality statistics – instead it builds the range that the known predicate can check on existing index, and does two full B-Tree dives to the index – one at the start of the range, and one at the end of it. For each possible index.
This simply means that even if you are not using the index to execute query, two leaf pages (and all the tree branches to reach them) will end up being fetched from disk into the cache – wasting both I/O cycles and memory.

There’s also quite interesting paradox at this – in some cases, more similar other indexes are, more waste they create because of rows-in-range checks. If a table has indexes on (a,b,c) and (a,b,d), query for (a,b,d) will be best satisfied by (a,b,d) index, but will evaluate range sizes for (a,b). If the first index were (a,c,b), it would be only able to check head and tail of (a) – so way less B-Tree positions would be cached in memory for the check. This makes better indexing sometimes fare worse than what they’re worth in benchmarks (assuming that people do I/O-heavy benchmarking :)

The easy way out is using FORCE INDEX. It will not do the index evaluation – and no B-Tree dives on unneeded index.

In my edge case testing with real data and skewed access pattern hitting a second index during ‘statistics’ phase has increased execution time by 70%, number of I/Os done by 75%, number of entrances into buffer pool by 31% and bloated buffer pool with data I didn’t need for read workload.

For some queries like “newest 10 entries” this will actually waste some space preheating blocks from the other end of the range that will never be shown – there will definitely be a B-Tree leaf page in buffer pool with edits from few years ago because of RIR. Unfortunately, the only MySQL-side solution for this is HANDLER interface (or probably HandlerSocket) – but it doesn’t make using FORCE INDEX not worth it – it just pushes towards making FORCE INDEX be much more forceful.

So, use the FORCE, Luke :)