From time to time I will observe servers wasting lots of CPU when doing batch row operations. In
perf top it will look like this:
8.24% mysqld [.] Arg_comparator::compare_int_unsigned() 7.17% mysqld [.] Item_cond_and::val_int() 4.37% mysqld [.] Item_field::val_int() 4.37% mysqld [.] Item_cond_or::val_int() 2.90% mysqld [.] MYSQLparse(void*) 2.64% mysqld [.] Item::val_bool()
Essentially if you construct queries like
(a=1 AND b=2) OR (a=3 AND b=4) ..., at large enough batch size evaluating the WHERE will become far more expensive than anything else (yes, more expensive than decompressing rows or doing all the InnoDB magic and what not).
MySQL has awesome syntax that makes certain batch lookups much faster: WHERE a IN (1,2,3). It constructs a tree that then each row can be compared against and one does not have to iterate through lists of predicates to check whether the row returned by batch index lookups needs to be filtered away. One would think that the composite syntax that server has
(WHERE (a,b) IN ((1,2),(3,4))) may help here.
Unfortunately, if you run a query using this syntax, a full table scan will be done, even if there’s a great index for that – even the syntax exists, none of sane ways to execute apply. Of course, there have to be bugs about that:
Compound (cola,colb) IN ((x,y)) statements do not use indexes. (filed by yours truly).
Well, it was closed by optimization lead (now at Maria), who decided, that this bug is a duplicate of another problem, (a,b,c)=(A,B,C). I probably should’ve contested there, but there might have been other people who’d do that? Sure there were, in 2007 Mark filed another bug:
If one would look more closely, there’s this small exchange in 2010:
[7 Jan 2010 21:35] Mark Callaghan This is the same as http://bugs.mysql.com/bug.php?id=16247, but the author of this feature request is much more convincing. [7 Jan 2010 21:36] Domas Mituzas hehe.
He wrote a Facebook note back then too.
Apparently Mark’s bug became the master bug for this problem, even if it arrived bit later, but, I can’t argue, he is far more convincing :-)
There’s a bit of odd banter there, Harrison (ohi!) points out that it is documented (albeit incorrectly, Sergey Petrunia – ohi – notices) in the manual. No, it isn’t this exact issue that is noted in the manual.
Duplicates are pouring in:
Multi column IN does not use index, says Peter Zaitsev (ohi!) in 2008
MySQL not using an index when using tuples in IN clause, says Shlomo (ohi!) in 2010
There are few more.
Sinisa (ohi!) pronounces:
In my humble opinion, this is still not a bug, but feature request. This feature request, however, should be very high on TODO list, as it would make a feature that is sorely missing, particularly since tuple equality expressions have been already optimized.
Fast forward into 2013, even in 5.6 we still have to use expressions with exponential cost,
(a,b) IN (( is causing full table scans and people are switching to NoSQLs, NewSQLs and OtherSQLs, because there’s no way to efficiently express batch predicates in MySQL (besides other reasons). Back in the day MySQL was ahead of everyone else with ease of use for simple expressions, but there have been nearly no changes since back in the day to make developer lives easier. Did you know that it may actually be faster to create a temporary table with all the values, then join against it, than actually to use any of existing SQL methods?
Where is MySQL innovation nowadays? In this exact case I guess you’re supposed to be memcached API (forget authorization, common interfaces, transactions and what not), solving this in SQL is too complicated.