MySQL 4.0 Google Edition

At the Conference I realized how much Wikipedia’s database operation had in common with Google – many rules and ideas of operation, problems faced, solutions imagined. Ah, there is one huge difference – they have brilliant engineers resolving quite a few of the issues, whereas we slack and live with as-is software. One of very nice coincidences is that our base MySQL version used is 4.0.26 – surprise, surprise, same one as the one Google have released patches for :) So, here the story of running MySQL fork begins…

Though MySQL 4.0.26 is no longer available from mysql.com downloads, it was still hosted at googlecode, though patches were hidden in SVN repo. Patching and building was piece of cake (just needed to hack ./configure to fix the LinuxThreads detection, as well as install some required packages, like gcc and make, on rather empty DB box).

Running was far more interesting (I’m doing most of this stuff for the first time, and it is of course one of live site DB servers), and it started with ‘mysqld –help’, where eyes immediately picked entries which did not exist there before (though I’d vote for marking newly added options with colored ‘Google’ tag, for those who don’t stare at MySQL configuration files 24/7 :) My initial pick was:

--rpl_always_enter_innodb --rpl_transaction_enabled --temp-pool
--innodb_read_io_threads=8 --innodb_write_io_threads=2

What I asked here was never to queue replication thread, use InnoDB information for replication slave position instead of relay log, avoid creating too many temporary files (use a pool of them), as well as increased number of IO threads (which, as I understand here, are asynchronous too). Restarted the server with new options – and it actually came up. I immediately looked at error log, where I could immediately see more verbose replication information. It did contain some information about initializations, so I restarted the server, just to see what is the proper operation. Here’s what I got:

InnoDB: ios_per_array 256 read threads 8 write threads 2
070623 11:38:08 InnoDB: In a MySQL replication slave the last relay-log file
                InnoDB: relay-log - filename ./db10-relay-bin.079, position (3707420)
                InnoDB: master-log - filename db4-bin.115, position (427431480)
070623 11:38:08  InnoDB: Started
070623 11:38:08 Read master - info: log_file_name: db4-bin.115, pos: 434884699
070623 11:38:08 Replication status inside InnoDB: relay-log['./db10-relay-bin.079'(3707420)], master-log['db4-bin.115'(427431480)]
070623 11:38:08 Slave I/O thread: adjust master reading header:
        Old: file:'db4-bin.115', position:434884699
        New: file:'db4-bin.115', position:434884656
070623 11:38:08 Replication status inside relay-log.info: relay-log['./db10-relay-bin.079'(3707420)], master-log['db4-bin.115'(427431480)]
/usr/local/mysql-wikimedia-google//libexec/mysqld: ready for connections.
Version: '4.0.26-wikimedia-google-log'  socket: '/tmp/mysql.sock'  port: 3306  Wikimedia Build
070623 11:38:08 Slave SQL thread initialized, starting replication in log 'db4-bin.115' at position 427431480, relay log './db10-relay-bin.079' position: 3707420
070623 11:38:08 Slave I/O thread: connected to master 'repl@db4:3306', asynchronous replication started in log 'db4-bin.115' at position 434884656

As it was pretty clean shutdown, values taken from InnoDB did not override relay-log information too much, and the server successfully started replicating (oh, asynchronously! – informs the patch).
The first more evil test was giving it proper load share, instead of loading up and warming up gradually. Of course, overload happened, but replication thread was served with priority, so the slave wasn’t too unhappy. As well, our load balancer decided not too load the server too much (it observes Threads_running). I guess async I/O helped there too, though can’t tell that much without benchmarks.

Once server was running and started getting load, I ended up looking at the goodies that were already there and didn’t need configuration. Apparently lots of features added are enabled by default (and all of them are in single patch), so it is not that easy to get rid of some of them (not that I need that :). SHOW STATUS added some new fields, though some (mallinfo) apparently were hit by 32 bit counter sizes, and most of them are for semi-sync replication.
I would have really liked this set of new variables:

mysql> show status like 'repl%';
+---------------------------------+------------+
| Variable_name                   | Value      |
+---------------------------------+------------+
| Replication_fail_io_connections | 0          |
| Replication_last_event_buffered | 1182615032 |
| Replication_last_event_diff     | 0          |
| Replication_last_event_done     | 1182615032 |
| Replication_max_delay           | 3          |
| Replication_slowest_event       | 1          |
| Replication_slowest_event       | 1          |
+---------------------------------+------------+

Though the maximum numbers seemed to go up and down, so probably it is for particular time window, *shrug*. Quite nice addition was extracting OS information about MySQL process size itself, this does seem a bit Linux-centric though:

+---------------+-------------+
| Variable_name | Value       |
+---------------+-------------+
| Vm_size       | 13999898624 |
| Vm_data       | 13977845760 |
| Vm_rss        | 13521936384 |
+---------------+-------------+

InnoDB status report has been also extended (not only size limit from 64k to 128k) – now it reports more information about I/O, especially background threads or fsyncs (numbers are quite low as Saturdays observe lowest load):


*************************** 1. row ***************************
Status:
=====================================
070623 16:17:09 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 7 seconds
----------
BACKGROUND THREAD
----------
srv_master_thread loops: 16243 1_second, 16242 sleeps, 1624 10_second, 0 background, 0 flush
srv_master_thread log flush: 16283 sync, 1664 async
fsync callers: 9770 buffer pool, 0 other, 11 checkpoint, 1535 log aio, 16325 log sync, 0 archive
----------
SEMAPHORES
----------
Lock wait timeouts 0
OS WAIT ARRAY INFO: reservation count 34135, signal count 31043
Mutex spin waits 794941, rounds 3373874, OS waits 21689
RW-shared spins 17136, OS waits 7918; RW-excl spins 8803, OS waits 2994
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread) reads 0 writes 0 requests 0 io secs 0.000000 io msecs/request 0.000000 max_io_wait 0.000000
I/O thread 1 state: waiting for i/o request (log thread) reads 0 writes 1528 requests 1528 io secs 0.010483 io msecs/request 0.006861 max_io_wait 0.027000
I/O thread 2 state: waiting for i/o request (read thread) reads 30857 writes 0 requests 14002 io secs 126.370735 io msecs/request 9.025192 max_io_wait 161.586000
I/O thread 3 state: waiting for i/o request (read thread) reads 16999 writes 0 requests 7979 io secs 72.610741 io msecs/request 9.100231 max_io_wait 156.309000
I/O thread 4 state: waiting for i/o request (read thread) reads 14180 writes 0 requests 6658 io secs 61.282585 io msecs/request 9.204353 max_io_wait 149.164000
I/O thread 5 state: waiting for i/o request (read thread) reads 12690 writes 0 requests 6103 io secs 55.881265 io msecs/request 9.156360 max_io_wait 161.298000
I/O thread 6 state: waiting for i/o request (read thread) reads 11871 writes 0 requests 5685 io secs 53.666005 io msecs/request 9.439931 max_io_wait 160.487000
I/O thread 7 state: waiting for i/o request (read thread) reads 11563 writes 0 requests 5489 io secs 51.701108 io msecs/request 9.419040 max_io_wait 161.135000
I/O thread 8 state: waiting for i/o request (read thread) reads 11039 writes 0 requests 5284 io secs 49.634792 io msecs/request 9.393413 max_io_wait 149.591000
I/O thread 9 state: waiting for i/o request (read thread) reads 10980 writes 0 requests 5304 io secs 50.317586 io msecs/request 9.486724 max_io_wait 161.720000
I/O thread 10 state: waiting for i/o request (write thread) reads 0 writes 185439 requests 162898 io secs 4.820036 io msecs/request 0.029589 max_io_wait 9.558000
I/O thread 11 state: waiting for i/o request (write thread) reads 0 writes 90595 requests 80475 io secs 3.201714 io msecs/request 0.039785 max_io_wait 9.739000
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
1516536 OS file reads, 268607 OS file writes, 27641 OS fsyncs
57.85 reads/s, 17273 avg bytes/read, 34.71 writes/s, 2.29 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf for space 0: size 9909, free list len 36366, seg size 46276,
403495 inserts, 391709 merged recs, 94372 merges
Hash table size 24902177, used cells 12151667, node heap has 24584 buffer(s)
1455.65 hash searches/s, 2672.33 non-hash searches/s
---
LOG
---
Log sequence number 126 167307111
Log flushed up to   126 167307111
Last checkpoint at  125 4175748013
0 pending log writes, 0 pending chkp writes
19521 log i/o's done, 1.29 log i/o's/second, 16324 syncs, 1626 checkpoints
log sync callers: 276034 buffer pool, background 16283 sync and 1664 async, 0 internal, checkpoint 1626 sync and 0 async, 0 archive, commit 0 sync and 0 async
log sync syncers: 80 buffer pool, background 16243 sync and 0 async, 0 internal, checkpoint 1 sync and 0 async, 0 archive, commit 0 sync and 0 async
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 13711864112; in additional pool allocated 5884416
Buffer pool size   768000
Free buffers       60
Database pages     743356
Modified db pages  177204
Pending reads 2
Pending writes: LRU 0, flush list 0, single page 0
Pages read 1580077, created 7462, written 276034
61.42 reads/s, 0.00 creates/s, 19.00 writes/s
Buffer pool hit rate 995 / 1000
--------------
ROW OPERATIONS
--------------
3 queries inside InnoDB, 0 queries in queue
Main thread process no. 30147, id 1199630672, state: sleeping
Number of rows inserted 430539, updated 251931, deleted 257631, read 83306576
15.28 inserts/s, 10.43 updates/s, 0.57 deletes/s, 3051.42 reads/s
------------
TRANSACTIONS
------------
Trx id counter 0 4106483684
Purge done for trx's n:o < 0 4106482391 undo n:o = 0.682164995612

                        ORDER BY page_random LIMIT 1
Trx read view will not see trx with id >= 0 4106483684, sees = 0 4106483683, sees = 0 4106483682, sees = 0 4106483680, sees = 0 4106483676, sees = 0 4106483675, sees = 0 4106483672, sees = 0 4106483671, sees = 0 4106483662, sees = 0 4106483660, sees = 0 4106483630, sees = 0 4106483624, sees = 0 4106483601, sees = 0 4106483598, sees = 0 4106483585, sees = 0 4106483552, sees = 0 4106483445, sees = 0 4106483269, sees = 0 4106483074, sees = 0 4106482518, sees = 0 4106482327, sees < 0 4106481806
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set (0.12 sec)

Now one of most delicious goodie is per-user/table/index profiling introduced (and I’ve been craving for this after seeing statistics collected by PG):

mysql> show user_statistics \G
*************************** 1. row ***************************
                  User: wikiuser
     Total_connections: 2341272
Concurrent_connections: 22
        Connected_time: 293398
             Busy_time: 24374
          Rows_fetched: 44126660
          Rows_updated: 0
       Select_commands: 18764733
       Update_commands: 0
        Other_commands: 9116763
   Commit_transactions: 2902096
 Rollback_transactions: 0
*************************** 2. row ***************************
                  User: #mysql_system#
     Total_connections: 1
Concurrent_connections: 2
        Connected_time: 33777
             Busy_time: 2449
          Rows_fetched: 0
          Rows_updated: 984119
       Select_commands: 28
       Update_commands: 522506
        Other_commands: 270265
   Commit_transactions: 209620
 Rollback_transactions: 0
mysql> show table_statistics like '%.r%';
+----------------------+-----------+--------------+-------------------------+
| Table                | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+----------------------+-----------+--------------+-------------------------+
| enwiki.revision      | 6617667   | 40676        | 244056                  |
| enwiki.redirect      | 8335      | 763          | 1526                    |
| enwiki.recentchanges | 1278791   | 89483        | 715864                  |
+----------------------+-----------+--------------+-------------------------+
3 rows in set (0.00 sec)
mysql> show index_statistics like 'enwiki.p%';
+---------------------------------------+-----------+
| Index                                 | Rows_read |
+---------------------------------------+-----------+
| enwiki.page_restrictions.pr_cascade   | 34530     |
| enwiki.page.PRIMARY                   | 3932170   |
| enwiki.page_restrictions.PRIMARY      | 169764    |
| enwiki.page_restrictions.pr_typelevel | 12429     |
| enwiki.page_restrictions.pr_id        | 1         |
| enwiki.page.page_random               | 1752062   |
| enwiki.pagelinks.pl_from              | 77848     |
| enwiki.page.name_title                | 54569748  |
| enwiki.pagelinks.pl_namespace         | 237991    |
| enwiki.page_restrictions.pr_page      | 1218      |
+---------------------------------------+-----------+
10 rows in set (0.00 sec)

Such kind of profiling immediately brings many new ideas how to improve the database (or at least – what to clean up :).

Many goodies provided (like SUPER-user audit log, or reserved connections, or backported START SLAVE UNTIL, or fast master promotion, or ..) may still be useful in future – we needed that quite a few times before. I’m not sure if we’ll start using semi-sync replication – it is very useful feature, but simply we don’t need it. Now binlog mirroring is really really elegant way to handle replication, and we’ll have to think of ways to make that run properly. Now we have to do some stability testing on our zoo and roll this live to all core database slaves.

Summarizing up, if anyone is still running MySQL/InnoDB 4.0, Google Edition of it makes it much more fun (and manageable)! So, thanks to Mark & Chip & others who made this stuff, and I hope their wishes to see it in main MySQL branch will come true. Now we have few more reasons to stick to 4.0 :)

One thought on “MySQL 4.0 Google Edition”

  1. I liked that more reasons to stick to 4.0 part. :) It just gets the job done and Google has added things that really matter to high load places.

Comments are closed.

%d bloggers like this: