after the conference, mydumper, parallelism, etc

Though slides for my MySQL Conference talks were on the O’Reilly website, I placed them in my talks page too, for both dtrace and security presentations.

I also gave a lightning talk about mydumper. Since my original announcement mydumper has changed a bit. It supports writing compressed files, detecting and killing slow queries that could block table flushes, supports regular expressions for table names, and trunk is slowly moving towards understanding that storage engines differ :)

I’ve been using mydumper quite a lot in my deployments (and observing 10x faster dumps). Now, the sad part is how to do faster recovery. It is quite easy to parallelize load of data (apparently, xargs supports running parallel processes):

echo *.sql.gz | xargs -n1 -P 16 -I % sh -c 'zcat % | mysql dbname'

Still, that doesn’t scale much – only doubles the load speed, compared to single threaded load, even on quite powerful machine. The problem lives in log_sys mutex – it is acquired for every InnoDB row operation, to grab LogicalSequenceNumbers (LSNs), so neither batching nor differentiation strategies really help, and same problem is hit by LOAD DATA too. In certain cases I saw quite some spinning on other mutexes, and it seems that InnoDB currently doesn’t scale that well with lots of small row operations. Maybe someone some day will pick this up and fix, thats why we go to conferences and share our findings :)

6 thoughts on “after the conference, mydumper, parallelism, etc”

With Ben’s latest changes in the v3 Google patch, log_sys->mutex is probably #3 on the list of hot InnoDB mutexes (and that ignores the non-Innodb hot mutexes). kernel_mutex and commit_prepare_mutex are bigger problems for Innodb.

Mark, thats for mixed and ro loads, but pure ‘data insert’ load still is pure log_sys

Publish your benchmark results.

Loading 8 million rows (1G data, 2G indexing):

1 thread  - 8min40s
2 threads - 4min50s
4 threads - 3min30s
8 threads - 3min06s

Which code did you use? Unpatched InnoDB? Percona? v3 Google? MySQL 5.4?

Also, can you provide mutex contention data for this result?

Load into MyISAM and ALTER TABLE an idea? Lot of copying then, but the tradeoff might work in the right setting. Load into MyISAM without indexes, etc. It’ll require a bit of futzing the process but I think that’s opportunities there.

Comments are closed.

	markcallaghan (@mark… on MySQL does not need SQL
	markcallaghan (@mark… on MySQL does not need SQL
	Domas Mituzas on MySQL does not need SQL
	Marc on MySQL does not need SQL
	Nils Meyer on linux memory management for…

Related

6 thoughts on “after the conference, mydumper, parallelism, etc”