5.0 journal: various issues, replication prefetching, our branch

First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed ‘-g’ CFLAGS to my build, and that replaced default ‘-O2’. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.

Few migration notes – once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY – which pads data with bytes. Solution was converting everything into VARBINARY – as actually it doesn’t have much overhead. TRIM('' FROM field) eventually helped too.

The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately – though it had few more queries remaining in the queue – so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though…

I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to >300% instead of ~100% – so I felt nearly like I’ve done the good thing. Interesting though – profile showed that quite a lot of CPU time was spent in synchronization – mutexes and such – so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to ‘ALTER TABLE … ENABLE KEYS’ or something similar.

Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen’s prefetcher has disappeared from the internets, so the only one left was maatkit’s mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread – and speedup was less than 50%. I made a quick hack that parallelized the task, and it managed to double replication speed.

Still, there’re few problems with the concept – it preheats just one index, used for lookup, and doesn’t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless – as all buffer pool pages required would be already in memory – thus leading to less spots of mutex contention.

We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not – file a bug. For now, we have some workarounds, and don’t seem to be bitten too much by the behavior.

Anyway, in the end I directed half of this site’s core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we’re just doing something wrong.

I’ve published our MySQL branch at launchpad. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)

I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there’s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)

MySQL 5.0 optimizer: loose scans

MySQL 5.0 among lots of visible features, introduced several neat optimizer improvements, that may give surprising performance in some queries. Loose index scan, or rather index jumping, allows fast aggregate max() or min() operations, as well as distinct row queries.

Let’s have such table:

+------+------+
| a    | b    |
+------+------+
|    1 |    5 |
|    1 |    3 |
|    2 |    4 |
|    2 |    2 |
|    1 |    1 |
+------+------+

A SELECT MIN(b),MAX(b) FROM ournicelittletable GROUP BY a might be executed in different ways. Usually aggregate functions need to build a temporary table in order to sort results, or use index for sorting.

In case of temporary table, aggregation is done by reading all rows from dataset and updating in-memory grouped results. With our tiny sample dataset there would be table created with primary key ‘a’, as well as min() and max() columns. Then a table scan would happen, which would update aggregated values on every in-memory row with bigger for max() or smaller for min() b value found. After full scan is done, this table could be returned directly to user.

If there is an index on fields (a,b), then tight index scan may happen – it would read values in such order: (1,1),(1,3),(1,5),(2,2),(2,4). In this kind of execution engine does not need to build any kind of temporary tables, once it reads all rows with same ‘a’ value, it can return a minimums or maximums to the user, then continue working on other ‘a’ values.

If MySQL 5.0 would find an index on (a,b), it would simply go to each a values, read one row for min(), one row for max(), as that information is already in index tree, and immediately return that to user.

I built a bigger table, with >200k rows, with unique field a, and added 233 groups, specified in b, with 1000 values in c each. One group didn’t have 1000 values, so I tried to find which one:

mysql> select b,max(c),min(c) from tttable group by b  having max(c)<1000;
+------+--------+--------+
| b    | max(c) | min(c) |
+------+--------+--------+
|  233 |     83 |      1 |
+------+--------+--------+
1 row in set (0.01 sec) (no query cache :)

Then I tried same on different engine, which didn’t support loose scans, and query took 0.2s. I repeated executions on both engines, so that data would be cached by OS, but speed difference was quite noticable. Of course, we could check what was done in there. The 0.01s query was executed this way:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tttable
         type: range
possible_keys: NULL
          key: b_2
      key_len: 5
          ref: NULL
         rows: 471
        Extra: Using index for group-by

Please note the ‘Extra’ line, and how many rows were read. As intended, two rows from each group, one for min() and one for max().
Now the 0.2s query did tell me that it was reading 230k rows. It did actually decide that in-memory table would be faster than reading index and did scan whole table. Another operation I tried was SELECT DISTINCT b FROM tttable. MySQL 5.0 executed the query in 0.01s, though the other engine was again more than 15 times slower, just because it had to deal with more rows.

Conclusion: loose index scans help. Take a look at the manual.