On MySQL replication, again…

There are few things one is supposed to know about MySQL replication in production, as manual doesn’t always discuss things openly.

This is small set of rules and advices I compiled (some apply to statement based replication only, but row based replication may benefit from one or two too):

  • Don’t use MyISAM. Or in fact, don’t use any non-transactional engine, if you care about your data. On either side, master or slave, or both – using non-transactional engines will cause data drift, as partially executed statements on master would be fully executed on slave, or simply stop replication. Of course, every crash has the capacity of getting your tables desynced from each other and there are absolutely no guarantees.
    This “don’t” can be easily transformed into “do” – just use InnoDB. Bonus point – one doesn’t need to take down the server, to clone a slave from a master :)
  • Don’t use temporary tables. MySQL manual is very funny about temporary tables in replication, it says “do not shut down the slave while it has temporary tables open.” That of course means that you’re not supposed to crash either – and more slaves there are, more of them will crash because of various reasons (e.g. solar flares).
    The operational overhead temporary tables add is huge – even though it may not show up in the benchmark.
  • Prefer simple, idempotent statements. If one can replay same statements multiple times without having database drift it doesn’t matter much if replication position is somewhat outdated. Updating rows by PK to fixed values, avoiding multiple table updates/deletes can allow to recover after crash much faster.
  • Set sync_binlog=1. This will introduce biggest bottleneck for transactions, but losing 30s of data may be worse (as this will force to do full slave resync in most cases). On really busy servers one can go for higher values (e.g. sync every 20 transactions), but 0 is asking for disaster.
  • Avoid long running updates. Though all long statement would cause on a master is slightly longer locking window and some performance pressure, once it gets replicated to the slave, all the updates will have to wait for the giant one to finish, in many cases rendering the slave useless.
    If something big has to be replicated, either split it into smaller chunks or run it directly against slaves (with binary logging on the master disabled for it).
    Splitting into smaller chunks can allow wait-for-slave logic to be implemented, thus not having any major impact on production environments.
  • Don’t use replicate-do-db. Or replicate-ignore-db. They both rely on database context, and statements like ‘INSERT INTO database.table’ will fail.
    If you need it, use replicate-wild-do-table=db.% – but even then, be careful with cross-database statements, that involve tables from multiple databases – as they may be filtered out…
  • Note the multiversioning. Some statements may become replication performance hogs because of long-running transactions (backups? reporting? ETL?) running on slaves – it may not need to rescan all the row versions on master, but they’d be still there on a slave. Such statements may need to be rewritten to avoid scanning gaps with too many invisible rows, or long transactions have to be split.

Though probably the best advice I can give now is “call your mysql vendor and ask for transactional replication“. Server, rack, datacenter crashes will not cause excessive work on fixing replication – it will be always consistent. One can even disable log syncing to disk then \o/

9 thoughts on “On MySQL replication, again…”

  1. What is your load? I ask because setting sync_binlog=1 will kill me on performance, I did it once and we went from fast to standing. And on the other hand setting it to 20 seconds seems long, I would think (not for sure) that the OS would sync the disk before than.

    1. 20 means not ‘seconds’, but ‘events’. We have thousands of transactions per second on some systems – thats high enough load :-)

      There’re things one must have to run high performance systems though – XFS, deadline I/O scheduler, battery-backed cache.

  2. Domas,

    I agree 100%, and have provided exactly the same points to customers and others in the past. MyISAM and temp tables are particularly nasty. And, by the way, even a non-crash “clean shutdown” of MySQL+MyISAM on a slave has the potential to de-sync replication. Running statements are aborted at shutdown, and if any of them are multi-row statements, aborting them will logically (although not physically) corrupt MyISAM tables.

    Regards,

    Jeremy

  3. sync_binlog=1 is a performance killer with ext3. Dave, maybe that’s your case. But of course we should all be using XFS by now :)

    1. Yes, ext3 is much slower even with a write behind write cache, as write-behind cache isn’t exactly instant, so doing less I/O operations or doing them more efficiently is important. XFS forever!

      1. Domas – I have a sys admin that won’t come near XFS with a 10′ poll. He has had lost of data and complete system crash with XFS. I’m not an expert, but the Linux ChangeLog-2.6.34 has many fixes to the XFS layer, some a few concerning sync’ing data.

        Changing to a new FS is scary. Do you have any supporting data or websites that shows XFS to be stable?

        Regards.

  4. “idempotent” … that’s a favourite word of harried sysadmins. Along with “redundant” and “verified.”

    (why yes, we just had an Oracle upgrade failure at work that went very like the cartoon. Ridiculous fragility that no-one would put up with from open source, but is standard in the proprietary world. Expensive consultant doing eighteen hours in one shift to get our stuff to behave.)

Comments are closed.

%d bloggers like this: