Last weekend I ended up working on small pet project – and today I’m kind of releasing it.

So, I had that idea that there’s no good tool to do logical dump of MySQL data for large sites – mysqldump doesn’t provide too much of I/O pressure, mk-parallel-dump is closer, but it doesn’t do consistent snapshots, uses same mysqldump, as well as is written in Perl (haha!), and… I just wanted something new to hack, as proof of concept. For a while to use it one had to edit constants in code, but my colleague Mark contributed options support and it doesn’t need recompiles anymore to run it :)

So, let me introduce mydumper. It doesn’t dump table definitions, all it does is extracting data and writing it to files, fast.

I took ~20GB-sized French Wikipedia core database (SHOW TABLE STATUS), and tried dumping it with three different methods – mysqldump, mk-parallel-dump and mydumper (used 32 thread, chunked backup setting for last two).

Dump times, smaller is better:

mysqldump: 75m18s
maatkit:    8m13s
mydumper:   6m44s \o/ WINNER \o/

There’s no cache skew – I restarted mysqld before every test, and it is using O_DIRECT.

At certain moments it seemed like gigabit network wasn’t enough for the test… It seems, it was using underlying I/O properly too:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.58    0.00    3.28   48.14    0.00   36.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
sda              19.00    26.00 3077.00    6.00 151720.00   306.00

   avgrq-sz avgqu-sz   await  svctm  %util
      49.31    32.09   10.27   0.32 100.00

Though, once I tried from warm caches, and saw 2m rows read a second, I had a warm fuzzy feeling :)

Apparently the trick of having successful fast mysql dump was applying lots of pressure to underlying storage as well as using multi-processor capabilities. So we do, so can you!

Oh, and easiest way to start is:

bzr co lp:mydumper/0.1
cd mydumper
./mydumper --help

Alternatively, one can use ‘lp:mydumper’ to get trunk – though various things (like startup options) can change. Feel free to file bugs, ask questions, and contribute with anything you think is worth contributing (thats why it ended up on Launchpad).

FAQs page can have answers to questions that might arise too :)

Update: Added also downloadable archive for bzr impaired at downloads page.

16 thoughts on “mydumper”

    1. mysqlhotcopy would not work with InnoDB datastores (nor MyISAM, it locks stuff too much), innodb hot backup is proprietary, and neither perform logical backups.

      1. Try xtrabackup from percona. It is essentially a reimplementation of mysqlhotcopy that works with their version of innodb in the percona mysql server and with the standard upstream mysql + innodb.

  1. This is an awesome tool for me, actually. I have been backing up MyISAM/InnoDB databases nightly for a client and once a week lately it seems the job gets stuck or takes so long it bottlenecks some cronjobs which never finish as well, and then run the next instance over and over until the box has to be forcibly rebooted.

    This might help cut down on the time needed to do the dump!

    I should probably also put a lockfile in, now that I think about it, so cronjobs exit without doing anything during the nightly backup. I don’t have extra resources for some sort of replication based backup, LVM backups would be proprietary and not plain SQL … I was looking for an alternative but to get a plain SQL dump it seems like this might wind up being the winner!

  2. What options did you use for mk-parallel-dump? If you give it –tab, it does NOT use mysqldump, it uses SELECT INTO OUTFILE. This might end up being faster than mydumper. Of course it only works if you run it on the same machine you’re dumping. Oh, and by default it’s gzipping the result of the dump, so if you disable that, it should be faster too.

    Not that I disagree with you: Perl ain’t the solution here…

  3. Baron,

    I know about mk-parallel-dump options and about the –tab :). Actually in my tests gzip doesn’t matter (client has enough of CPU to facilitate that ;-) – and I don’t want to use tab formats (the whole idea was having the logical sql dump).

    This project showcases that there are ways to do things better ;-)

  4. [root@localhost mydumper-0.1.5]# yum install glib2-devel mysql-devel zlib-devel pcre-devel
    Setting up Install Process
    Parsing package install arguments
    Package glib2-devel – 2.12.3-4.el5_3.1.i386 is already installed.
    Package mysql-devel – 5.0.45-7.el5.i386 is already installed.
    Package zlib-devel – 1.2.3-3.i386 is already installed.
    Package pcre-devel – 6.6-2.el5_1.7.i386 is already installed.
    Nothing to do
    [root@localhost mydumper-0.1.5]# make
    cc `mysql_config –cflags` `pkg-config –cflags glib-2.0 gthread-2.0` `pcre-config –cflags` -O3 -g -c -o mydumper.o mydumper.c
    mydumper.c: In function ‘check_regex’:
    mydumper.c:112: error: ‘PCRE_NEWLINE_ANY’ undeclared (first use in this function)
    mydumper.c:112: error: (Each undeclared identifier is reported only once
    mydumper.c:112: error: for each function it appears in.)
    make: *** [mydumper.o] Error 1
    [root@localhost mydumper-0.1.5]#

    Thank you for “nice” software.

  5. my fault – it has been fixed in trunk though, I was depending on too new PCRE version.

    just removing PCRE_NEWLINE_ANY from the file helps ;-)

    I’ll release 0.1.6 with that fix, I guess, was a bit slacking, and expecting more changes to accumulate.

  6. I removed PCRE_NEWLINE_ANY and it worked perfectly.
    Really fast and easy to use!
    I need to run more tests but it looks very promising.
    Thank you for such a nice tool.

  7. Hi Domas,

    I download mydumper-0.2.3.tar.gz , and untar it , found mydumper-0.2.3.

    [root@adampur mydumper-0.2.3]# make
    make: *** No targets specified and no makefile found. Stop.

    Although, I have installed all the required packages though yum.

    Kindly suggest !!


Comments are closed.