MySQL 5.0 among lots of visible features, introduced several neat optimizer improvements, that may give surprising performance in some queries. Loose index scan, or rather index jumping, allows fast aggregate max() or min() operations, as well as distinct row queries.
Let’s have such table:
+------+------+ | a | b | +------+------+ | 1 | 5 | | 1 | 3 | | 2 | 4 | | 2 | 2 | | 1 | 1 | +------+------+
SELECT MIN(b),MAX(b) FROM ournicelittletable GROUP BY a might be executed in different ways. Usually aggregate functions need to build a temporary table in order to sort results, or use index for sorting.
In case of temporary table, aggregation is done by reading all rows from dataset and updating in-memory grouped results. With our tiny sample dataset there would be table created with primary key ‘a’, as well as min() and max() columns. Then a table scan would happen, which would update aggregated values on every in-memory row with bigger for max() or smaller for min() b value found. After full scan is done, this table could be returned directly to user.
If there is an index on fields (a,b), then tight index scan may happen – it would read values in such order: (1,1),(1,3),(1,5),(2,2),(2,4). In this kind of execution engine does not need to build any kind of temporary tables, once it reads all rows with same ‘a’ value, it can return a minimums or maximums to the user, then continue working on other ‘a’ values.
If MySQL 5.0 would find an index on (a,b), it would simply go to each a values, read one row for min(), one row for max(), as that information is already in index tree, and immediately return that to user.
I built a bigger table, with >200k rows, with unique field a, and added 233 groups, specified in b, with 1000 values in c each. One group didn’t have 1000 values, so I tried to find which one:
mysql> select b,max(c),min(c) from tttable group by b having max(c)<1000; +------+--------+--------+ | b | max(c) | min(c) | +------+--------+--------+ | 233 | 83 | 1 | +------+--------+--------+ 1 row in set (0.01 sec) (no query cache :)
Then I tried same on different engine, which didn’t support loose scans, and query took 0.2s. I repeated executions on both engines, so that data would be cached by OS, but speed difference was quite noticable. Of course, we could check what was done in there. The 0.01s query was executed this way:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: tttable type: range possible_keys: NULL key: b_2 key_len: 5 ref: NULL rows: 471 Extra: Using index for group-by
Please note the ‘Extra’ line, and how many rows were read. As intended, two rows from each group, one for min() and one for max().
Now the 0.2s query did tell me that it was reading 230k rows. It did actually decide that in-memory table would be faster than reading index and did scan whole table. Another operation I tried was
SELECT DISTINCT b FROM tttable. MySQL 5.0 executed the query in 0.01s, though the other engine was again more than 15 times slower, just because it had to deal with more rows.
Conclusion: loose index scans help. Take a look at the manual.
3 thoughts on “MySQL 5.0 optimizer: loose scans”
I guess this is almost the same as Oracle IOT (Index Organized Tables) which are avialable
since 8i (and lots of updates in 9i).
See http://otn.oracle.com/products/oracle9i/datasheets/iots/iot_ds.html for more details.
Not really, InnoDB storage engine for MySQL does keep row data together with PK in B+-Tree as well since 3.23 versions. The loose scan isn’t about just storing data, it is about accessing it efficiently.
Just to add to Domas’s point – loose index scan is a way
to access ordered data efficiently, so that the minimal
number of keys is accessed. It is implemented in the
execution engine, and is completely independent of
the storage manager. It can work with any ordered
index that allows prefix lookup.
Comments are closed.