mydumper

how MySQL engineering broke the backups

MySQL has exceptional track of record by introducing minor fixes that cause major breakages. Though usually I could blame naiveté of engineers, who did not really ever have to deal with production implications, but lately I can start sensing various business implications against open-source offerings.

As an original author of mydumper I really cannot get out of my mind that 5.5 and 5.6 metadata locking changes are there to screw with anyone who is building a backup solution using stable snapshot views of MySQL (for example, mysqldump –single-transaction, the golden standard of backing things up in MySQL world).

As seen in a bug #71017 (palindrome!) filed by my esteemed colleague Eric, newly introduced behaviors gobble all the locks possible, even if it makes absolutely no sense for backup/ETL/migration/etc scenarios.

The only supported way out of that is using MySQL Enterprise Backup, which is proprietary software, and does not produce logical backups that allow selective data restores or ETL capabilities or anything else. You get complete vendor lock in where there is no way to get your data out of the system in a consistent manner, unless, of course, you restrict to “no metadata changes allowed in production” mode.

after the conference, mydumper, parallelism, etc

Though slides for my MySQL Conference talks were on the O’Reilly website, I placed them in my talks page too, for both dtrace and security presentations.

I also gave a lightning talk about mydumper. Since my original announcement mydumper has changed a bit. It supports writing compressed files, detecting and killing slow queries that could block table flushes, supports regular expressions for table names, and trunk is slowly moving towards understanding that storage engines differ :)

I’ve been using mydumper quite a lot in my deployments (and observing 10x faster dumps). Now, the sad part is how to do faster recovery. It is quite easy to parallelize load of data (apparently, xargs supports running parallel processes):

echo *.sql.gz | xargs -n1 -P 16 -I % sh -c 'zcat % | mysql dbname'

Still, that doesn’t scale much – only doubles the load speed, compared to single threaded load, even on quite powerful machine. The problem lives in log_sys mutex – it is acquired for every InnoDB row operation, to grab LogicalSequenceNumbers (LSNs), so neither batching nor differentiation strategies really help, and same problem is hit by LOAD DATA too. In certain cases I saw quite some spinning on other mutexes, and it seems that InnoDB currently doesn’t scale that well with lots of small row operations. Maybe someone some day will pick this up and fix, thats why we go to conferences and share our findings :)

Last weekend I ended up working on small pet project – and today I’m kind of releasing it.

So, I had that idea that there’s no good tool to do logical dump of MySQL data for large sites – mysqldump doesn’t provide too much of I/O pressure, mk-parallel-dump is closer, but it doesn’t do consistent snapshots, uses same mysqldump, as well as is written in Perl (haha!), and… I just wanted something new to hack, as proof of concept. For a while to use it one had to edit constants in code, but my colleague Mark contributed options support and it doesn’t need recompiles anymore to run it :)

So, let me introduce mydumper. It doesn’t dump table definitions, all it does is extracting data and writing it to files, fast. Continue reading “mydumper”

	markcallaghan (@mark… on MySQL does not need SQL
	markcallaghan (@mark… on MySQL does not need SQL
	Domas Mituzas on MySQL does not need SQL
	Marc on MySQL does not need SQL
	Nils Meyer on linux memory management for…