<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>domas mituzas &#187; replication</title>
	<atom:link href="http://dom.as/tag/replication/feed/" rel="self" type="application/rss+xml" />
	<link>http://dom.as</link>
	<description></description>
	<lastBuildDate>Thu, 02 Feb 2012 21:29:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='dom.as' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/6e344c6e0cd7462eb056f8b98eb2cbcd?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>domas mituzas &#187; replication</title>
		<link>http://dom.as</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://dom.as/osd.xml" title="domas mituzas" />
	<atom:link rel='hub' href='http://dom.as/?pushpress=hub'/>
		<item>
		<title>on MySQL replication prefetching</title>
		<link>http://dom.as/2011/12/03/replication-prefetching/</link>
		<comments>http://dom.as/2011/12/03/replication-prefetching/#comments</comments>
		<pubDate>Sat, 03 Dec 2011 21:46:46 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[prefetch]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://dom.as/?p=1536</guid>
		<description><![CDATA[For the impatient ones, or ones that prefer code to narrative, go here. This is long overdue anyway, and Yoshinori already beat me, hehe&#8230; Our database environment is quite busy &#8211; there&#8217;re millions of row changes a second, millions of &#8230; <a href="http://dom.as/2011/12/03/replication-prefetching/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=1536&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><em>For the impatient ones, or ones that prefer code to narrative, <a href="http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/tools/files/head:/prefetch/">go here</a>. This is long overdue anyway, and Yoshinori already <a href="http://yoshinorimatsunobu.blogspot.com/2011/10/making-slave-pre-fetching-work-better.html">beat me</a>, hehe&#8230;</em></p>
<p>Our database environment is quite busy &#8211; there&#8217;re millions of row changes a second, millions of I/O operations a second and impact of that can be felt at each shard. Especially, as we also have to replicate to other datacenters, single threaded replication on MySQL becomes a real bottleneck.</p>
<p>We use multiple methods to understand and analyze replication lag composition &#8211; a simple replication thread state sampling via MySQL processlist helps to understand logical workload components (and work in that field yields great results), and pstack/GDB based replication thread sampling shows server internal behavior quite well too (a similar technique was used for <a title="On connections" href="http://dom.as/2011/08/28/mysql-connection-accept-speed/">accept thread visualisation</a>).</p>
<p>The biggest problem with single replication thread is that it has to read data to execute queries (rather than applying physical page deltas, like PG or just appending to files like HBase, it does logical edits to page data) &#8211; we can observe 95% of process time at that state. As generally there&#8217;s just one outstanding data read per replication thread, other workload hitting the machine will also make replication reads slower.</p>
<p>Generally, the obvious way to deal with slow I/O is issue more outstanding parallel requests, and the only way to do that apart from parallel replication, is to predict what will be needed in future and try to fetch that.</p>
<p>Many many moons ago Paul Tuckfield discussed about the Youtube replication prefetcher &#8211; it would take write statements yet to be executed in relay logs,  convert them to SELECTs and run them before replication thread needs that data. He still says that was one of most satisfying quick hacks :-)</p>
<p>Maatkit (now Percona Toolkit) introduced mk-slave-prefetch (I <a href="http://dom.as/2008/06/07/50-journal-various-issues-replication-prefetching-our-branch/">played with it</a> back in 2008, didn&#8217;t put it into operation at that time though), and eventually that looked like a reasonable option for prefetching statements on our database cluster.</p>
<p>5000 lines of Perl is not the easiest code to work with (or to debug), so the journey was quite bumpy. We got it working in some shape, eventually, but Baron, original author, <a href="http://twitter.com/#!/xaprb/statuses/128876485472829440">has something to say</a> about it:</p>
<p style="padding-left:30px;"><em>Please don&#8217;t use mk-slave-prefetch on MySQL unless you are Facebook. Or at least don&#8217;t tell your friends, so they won&#8217;t use it.</em></p>
<p>Anyway, our updates rate would saturate mksp.pl if we used anything fancier on it, so it was a constant balancing act, in which looking at the code was something nobody wanted to do ;-) Still, it was (and is) helping us, so getting rid of it wasn&#8217;t possible either.</p>
<p>At some point in time we decided to make an experiment &#8211; what if we executed statements, then rolled them back &#8211; so I did a quick implementation of that method from scratch in Python &#8211; resulting piece of code was relatively small and fun to experiment with.</p>
<p>There were multiple problems with such approach &#8211; one complication was that queries were grabbing locks for the duration of the statement, and some of those locks would collide with what actual replication thread is doing. Fixing that would require immediate lock wait timeout or transaction kill for prefetcher thread &#8211; so, relatively deep dive into InnoDB. Another problem was internal InnoDB lock contention on rollbacks &#8211; that was expensive operation, and benefits of pages read in were negated by rollback segments lock contention. Fixing that is even more extensive InnoDB work (though probably some people would like their rollbacks to be efficient ;-)</p>
<p>At that moment we came up with the idea, that InnoDB codebase could be instrumented to not do any real work on updates &#8211; just page data in and return to the caller, and if any change accidentally slips in, commits can fail. That looked like a feasible project for the future.</p>
<p>At some point in time we were rolling out a new database tier for one product, which was supposed to have really high volume of changes, but all coming in a uniform format. It took less than hour (as most of the work has been done to create rollback-based one) to come up with a prototype that would efficiently extract literals from uniform statements, then use them for prefetching.</p>
<p>This method worked fine &#8211; at tiny fraction of resources used by mk-slave-prefetch we were preloading secondary indexes and could have relatively extensive parallelism.</p>
<p>Meanwhile, our main database cluster was having more and more uniform query workload, thanks to various libraries, abstractions and middleware &#8211; so a day of work on lowest hanging fruits provided relatively good coverage of the workload.</p>
<p>We didn&#8217;t stop mksp.pl &#8211; it still provided some coverage for various odd cases, which were time-consuming to work on manually.</p>
<p>There were few other problems with the new method &#8211; apparently we were targeting our SELECTs too accurately &#8211; UPDATEs were spending plenty of time in <a href="http://dom.as/2011/01/27/a-case-for-force-index/">records_in_range</a>. Additionally, optimistic update path was reading in pages that selects wouldn&#8217;t (due to <a href="http://bugs.mysql.com/bug.php?id=61736">inefficiency</a> in B-Tree locking code). There were some odd reads done for INSERTs.</p>
<p>Also, SELECTs are using indexing less efficiently &#8211; InnoDB can pinpoint entries in secondary indexes by using PK values, yet that ability is <a href="http://bugs.mysql.com/bug.php?id=62025">not exposed</a> to SQL layer, so prefetching on indexes that don&#8217;t have explicitly defined all fields within them is not that easy.</p>
<p>In theory, all these issues are supposed to be &#8216;fixed&#8217; by fake changes concept. Percona recently <a href="http://www.percona.com/doc/percona-server/5.5/management/innodb_fake_changes.html">implemented it in their releases</a>, and we started experimenting with those changes. It is still not that mature concept, so we will be revisiting how things are or should be done, but for now test results are quite positive (we did some changes to reduce locking and avoid deadlock in REPLACE INTO, among other things).</p>
<p>I still observe I/Os done by main replication thread, so we&#8217;re not in perfect shape yet, but method seems to be working relatively well (at least it definitely speeds up replication). We still have to do lots of testing to qualify this for large-scale production, but this may allow way more write workload on our machines until we get parallel replication all around.</p>
<p>Our code for custom query, fake changes or rollback prefetcher can be checked out from a public repo together with other tools (oops, Bazaar doesn&#8217;t give easy access to subdirectories:</p>
<pre>bzr co lp:mysqlatfacebook/tools; cd prefetch</pre>
<p>Or <a href="http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/tools/files/head:/prefetch/">browse it online</a>.</p>
<p>P.S. There&#8217;s also <a href="http://www.continuent.com/solutions/tungsten-replicator">Tungsten Replicator</a> for ones who don&#8217;t want to wait for 5.6 parallel replication.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/1536/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=1536&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2011/12/03/replication-prefetching/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>On database write workload profiling</title>
		<link>http://dom.as/2011/05/10/write-workload-profiling/</link>
		<comments>http://dom.as/2011/05/10/write-workload-profiling/#comments</comments>
		<pubDate>Tue, 10 May 2011 12:18:27 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[oltp]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://dom.as/?p=849</guid>
		<description><![CDATA[I always have difficulties with complex analysis schemes, so fall back to something that is somewhat easier. Or much easier. Here I will explain the super-powerful method of database write workload analysis. Doing any analysis on master servers is already &#8230; <a href="http://dom.as/2011/05/10/write-workload-profiling/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=849&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I always have difficulties with complex analysis schemes, so fall back to something that is somewhat easier. Or much easier. Here I will explain the super-powerful method of database write workload analysis.</p>
<p>Doing any analysis on master servers is already too complicated, as instead of analyzing write costs one can be too obsessed with locking and there&#8217;s sometimes uncontrollable amount of workload hitting the server beside writes. Fortunately, slaves are much better targets, not only because writes there are single-threaded, thus exposing every costly I/O as time component, but also one can drain traffic from slaves, or send more in order to cause more natural workload.</p>
<p>Also, there can be multiple states of slave load:</p>
<ul>
<li>Healthy, always at 0-1s lag, write statements are always immediate</li>
<li>Spiky, usually at 0s lag, but has jumps due to sometimes occuring slow statements</li>
<li>Lagging, because of read load stealing I/O capacity</li>
<li>Lagging (or not catching up fast enough), because it can&#8217;t keep up with writes anymore, even with no read load</li>
</ul>
<p>Each of these states are interesting by themselves, and may have slightly different properties, but pretty much all of them are quite easy to look at using replication profiling.</p>
<p>The code for it is somewhat straightforward:<br />
<code><br />
(while true; do<br />
  echo 'SELECT info FROM information_schema.processlist<br />
        WHERE db IS NOT NULL AND user="system user"; '<br />
  sleep 0.1; done) | mysql -BN | head -n 100000 &gt; replication-sample<br />
</code></p>
<p>There are multiple ways to analyze it, e.g. finding slowest statements is as easy as:<br />
<code><br />
uniq -c replication-sample | sort -nr | head<br />
</code></p>
<p>More advanced methods may group up statements by statement types, tables, user IDs or any other random metadata embedded in query comments &#8211; and really lots of value can be obtained by doing ad-hoc analysis using simply &#8216;grep -c keyword replication-sample&#8217; &#8211; to understand what share of your workload certain feature has.</p>
<p>I already mentioned, that there are different shapes of slave performance, and it is easy to test it in different shapes. One of methods is actually stopping a slave for a day, then running the sampler while it is trying to catch up. It will probably have much more buffer pool space usable for write operations, so keep that in mind &#8211; certain operations that are depending on larger buffer pools would be much faster.</p>
<p>This is really simple, although remarkably powerful method, that allows quite deep workload analysis without spending too much time on statistics features. As there&#8217;s no EXPLAIN for UPDATE or DELETE statements, longer, coarser samples allow detecting deviations from good query plans too.</p>
<p>Systematic use of it has allowed to reveal quite a few important issues that had to be fixed &#8211; which were not that obvious from general statistics view. I like.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/849/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/849/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/849/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=849&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2011/05/10/write-workload-profiling/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>On MySQL replication, again&#8230;</title>
		<link>http://dom.as/2010/06/30/replication/</link>
		<comments>http://dom.as/2010/06/30/replication/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 20:59:13 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[must]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=745</guid>
		<description><![CDATA[There are few things one is supposed to know about MySQL replication in production, as manual doesn&#8217;t always discuss things openly. This is small set of rules and advices I compiled (some apply to statement based replication only, but row &#8230; <a href="http://dom.as/2010/06/30/replication/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=745&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There are few things one is supposed to know about MySQL replication in production, as manual doesn&#8217;t always discuss things openly.</p>
<p>This is small set of rules and advices I compiled (some apply to statement based replication only, but row based replication may benefit from one or two too):</p>
<ul>
<li><strong>Don&#8217;t use MyISAM</strong>. Or in fact, don&#8217;t use any non-transactional engine, if you care about your data. On either side, master or slave, or both &#8211; using non-transactional engines will cause data drift, as partially executed statements on master would be fully executed on slave, or simply stop replication. Of course, every crash has the capacity of getting your tables desynced from each other and there are absolutely no guarantees.<br />
This &#8220;don&#8217;t&#8221; can be easily transformed into &#8220;do&#8221; &#8211; just use InnoDB. Bonus point &#8211; one doesn&#8217;t need to take down the server, to clone a slave from a master :)</li>
<li><strong>Don&#8217;t use temporary tables</strong>. MySQL manual is very funny about temporary tables in replication, it says<em> &#8220;do not shut down the slave while it has temporary tables open.&#8221;</em> That of course means that you&#8217;re not supposed to crash either &#8211; and more slaves there are, more of them will crash because of various reasons (e.g. solar flares).<br />
The operational overhead temporary tables add is huge &#8211; even though it may not show up in the benchmark.</li>
<li><strong>Prefer simple, idempotent statements. </strong>If one can replay same statements multiple times without having database drift it doesn&#8217;t matter much if replication position is somewhat outdated. Updating rows by PK to fixed values, avoiding multiple table updates/deletes can allow to recover after crash much faster.</li>
<li><strong>Set sync_binlog=1. </strong>This will introduce biggest bottleneck for transactions, but losing 30s of data may be worse (as this will force to do full slave resync in most cases). On really busy servers one can go for higher values (e.g. sync every 20 transactions), but 0 is asking for disaster.</li>
<li><strong>Avoid long running updates.</strong> Though all long statement would cause on a master is slightly longer locking window and some performance pressure, once it gets replicated to the slave, all the updates will have to wait for the giant one to finish, in many cases rendering the slave useless.<br />
If something big has to be replicated, either split it into smaller chunks or run it directly against slaves (with binary logging on the master disabled for it).<br />
Splitting into smaller chunks can allow wait-for-slave logic to be implemented, thus not having any major impact on production environments.</li>
<li><strong>Don&#8217;t use replicate-do-db. </strong>Or replicate-ignore-db. They both rely on database context, and statements like &#8216;INSERT INTO database.table&#8217; will fail.<br />
If you need it, use replicate-wild-do-table=db.% &#8211; but even then, be careful with cross-database statements, that involve tables from multiple databases &#8211; as they may be filtered out&#8230;</li>
<li><strong>Note the multiversioning.</strong> Some statements may become replication performance hogs because of long-running transactions (backups? reporting? ETL?) running on slaves &#8211; it may not need to rescan all the row versions on master, but they&#8217;d be still there on a slave. Such statements may need to be rewritten to avoid scanning gaps with too many invisible rows, or long transactions have to be split.</li>
</ul>
<p>Though probably the best advice I can give now is &#8220;<strong>call your mysql vendor and ask for transactional replication</strong>&#8220;. Server, rack, datacenter crashes will not cause excessive work on fixing replication &#8211; it will be always consistent. One can even disable log syncing to disk then \o/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/745/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/745/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/745/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/745/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/745/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/745/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/745/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/745/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=745&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2010/06/30/replication/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>on replication compatibility</title>
		<link>http://dom.as/2009/12/05/on-replication-compatibility/</link>
		<comments>http://dom.as/2009/12/05/on-replication-compatibility/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 23:45:44 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=652</guid>
		<description><![CDATA[Dear MySQL, I will do this to rest of your code, if you continue breaking replication for me. &#8211; Domas<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=652&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Dear MySQL,</p>
<p>I will do <a href="http://bazaar.launchpad.net/%7Ewikimedia/sakila-server/mysql-5.1-wm/revision/3191">this</a> to rest of your code, if you continue <a href="http://bugs.mysql.com/bug.php?id=49474">breaking replication</a> for me.</p>
<p>&#8211; Domas</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/652/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=652&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2009/12/05/on-replication-compatibility/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>Evil replication management</title>
		<link>http://dom.as/2009/07/30/evil-replication-management/</link>
		<comments>http://dom.as/2009/07/30/evil-replication-management/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 14:46:02 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[evil]]></category>
		<category><![CDATA[gdb]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=567</guid>
		<description><![CDATA[When one wants to script automated replication chain building, certain things are quite annoying, like immutable replication configuration variables. For example, at certain moments log_slave_updates is more than needed, and thats what the server says: mysql&#62; show variables like 'log_slave_updates'; &#8230; <a href="http://dom.as/2009/07/30/evil-replication-management/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=567&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When one wants to script automated replication chain building, certain things are quite annoying, like immutable replication configuration variables. For example, at certain moments log_slave_updates is more than needed, and thats what the server says:</p>
<pre>
mysql&gt; show variables like 'log_slave_updates';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| log_slave_updates | OFF   |
+-------------------+-------+
1 row in set (0.00 sec)

mysql&gt; set global log_slave_updates=1;
ERROR 1238 (HY000): Variable 'log_slave_updates' is a read only variable
</pre>
<p>Of course, there are few options, roll in-house fork (heheeeee!), restart your server, and keep warming up your tens of gigabytes of cache arenas, or wait for MySQL to ship a feature change in next major release. Then there are evil tactics:</p>
<pre>
mysql&gt; system gdb -p $(pidof mysqld)
                       -ex "set opt_log_slave_updates=1" -batch
mysql&gt; show variables like 'log_slave_updates';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| log_slave_updates | ON    |
+-------------------+-------+
1 row in set (0.00 sec)
</pre>
<p>I don&#8217;t guarantee safety of this when slave is running, but&#8230; stopping and starting slave threads is somewhat cheaper, than stopping and starting big database instance, right?</p>
<p>What else can we do?</p>
<pre>
mysql&gt; show slave status \G
...
     Replicate_Do_DB: test
...
mysql&gt; system gdb -p $(pidof mysqld)
          -ex 'call rpl_filter-&gt;add_do_db(strdup("hehehe"))' -batch
mysql&gt; show slave status \G
...
      Replicate_Do_DB: test,hehehe
...
</pre>
<p>It is actually possible to add all sorts of filters this way, rpl_filter.h can be good reference :) So now that you want to throw out some data from your slaves, restart isn&#8217;t needed. Unfortunately, deleting entries isn&#8217;t possible via rpl_filter methods, but you can always edit base_ilists, can&#8217;t you?</p>
<p>P.S. having this functionality inside server would definitely be best.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/567/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=567&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2009/07/30/evil-replication-management/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>5.0 journal: various issues, replication prefetching, our branch</title>
		<link>http://dom.as/2008/06/07/50-journal-various-issues-replication-prefetching-our-branch/</link>
		<comments>http://dom.as/2008/06/07/50-journal-various-issues-replication-prefetching-our-branch/#comments</comments>
		<pubDate>Sat, 07 Jun 2008 10:07:59 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[wikipedia]]></category>
		<category><![CDATA[wikitech]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[launchpad]]></category>
		<category><![CDATA[maatkit]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=149</guid>
		<description><![CDATA[First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed &#8216;-g&#8217; CFLAGS to my build, and that replaced default &#8216;-O2&#8242;. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower. &#8230; <a href="http://dom.as/2008/06/07/50-journal-various-issues-replication-prefetching-our-branch/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=149&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed &#8216;-g&#8217; CFLAGS to my build, and that replaced default &#8216;-O2&#8242;. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.</p>
<p>Few migration notes &#8211; once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY &#8211; which pads data with bytes. Solution was converting everything into VARBINARY &#8211; as actually it doesn&#8217;t have much overhead. <code>TRIM('' FROM field)</code> eventually helped too.</p>
<p>The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately &#8211; though it had few more queries remaining in the queue &#8211; so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though&#8230;</p>
<p>I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to &gt;300% instead of ~100% &#8211; so I felt nearly like I&#8217;ve done the good thing. Interesting though &#8211; profile showed that quite a lot of CPU time was spent in synchronization &#8211; mutexes and such &#8211; so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to &#8216;ALTER TABLE &#8230; ENABLE KEYS&#8217; or something similar.</p>
<p>Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen&#8217;s prefetcher has disappeared from the internets, so the only one left was <a href="http://maatkit.org">maatkit&#8217;s</a> mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread &#8211; and speedup was less than 50%. I made a <a href="http://p.defau.lt/?uk7P3mE022R3FvZ9pVq8qg">quick hack</a> that parallelized the task, and it managed to double replication speed.</p>
<p>Still, there&#8217;re few problems with the concept &#8211; it preheats just one index, used for lookup, and doesn&#8217;t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless &#8211; as all buffer pool pages required would be already in memory &#8211; thus leading to less spots of mutex contention.</p>
<p>We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not &#8211; file a bug. For now, we have some workarounds, and don&#8217;t seem to be bitten too much by the behavior.</p>
<p>Anyway, in the end I directed half of <a href="http://en.wikipedia.org">this site&#8217;s</a> core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we&#8217;re just doing something wrong.</p>
<p>I&#8217;ve published our MySQL branch at <a href="https://code.launchpad.net/~wikimedia/mysql/mysql-5.0">launchpad</a>. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)</p>
<p>I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there&#8217;s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/domasmituzas.wordpress.com/149/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/domasmituzas.wordpress.com/149/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/149/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=149&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2008/06/07/50-journal-various-issues-replication-prefetching-our-branch/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>Trainwreck: external MySQL replication agent</title>
		<link>http://dom.as/2008/05/14/trainwreck-external-mysql-replication-agent/</link>
		<comments>http://dom.as/2008/05/14/trainwreck-external-mysql-replication-agent/#comments</comments>
		<pubDate>Wed, 14 May 2008 12:20:00 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[trainwreck]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=102</guid>
		<description><![CDATA[I wanted to work more on the actual project before writing about it, but I&#8217;m lazy, and dear community may be not. At Wikimedia we have one database server which replicates from multiple (like 15!) masters. It even splits replication &#8230; <a href="http://dom.as/2008/05/14/trainwreck-external-mysql-replication-agent/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=102&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I wanted to work more on the actual project before writing about it, but I&#8217;m lazy, and dear community may be not.</p>
<p>At Wikimedia we have one database server which replicates from multiple (like 15!) masters. It even splits replication streams by database, and applying changes in parallel.</p>
<p>All this stuff is done by external replication agent, Trainwreck. It is public-domain software, which was written by River, doesn&#8217;t have much documentation, works only on Solaris (River likes Solaris), unless you comment out all process management blocks, which use doors and other Solaris specific API.</p>
<p>It lives in <a href='http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/trainwreck/'>Wikimedia SVN</a>, and can be checked out using:</p>
<pre>svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/trainwreck/</pre>
<p>It sits there, maintained just for needs of that specific single server (ok, there might be two or three), so if anyone wants to make it available for broader audience, feel free to fork a project to some community-oriented place, add all nice features you need. :)</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/domasmituzas.wordpress.com/102/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/domasmituzas.wordpress.com/102/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/102/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=102&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2008/05/14/trainwreck-external-mysql-replication-agent/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>Replication will live!</title>
		<link>http://dom.as/2008/04/12/replication-will-live/</link>
		<comments>http://dom.as/2008/04/12/replication-will-live/#comments</comments>
		<pubDate>Sat, 12 Apr 2008 06:30:46 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://dammit.lt/?p=101</guid>
		<description><![CDATA[Brian exposed some of his internal letters about death of replication (caused by memcached). Back when he wrote this, I responded back a bit too. Now as quite a few people really want to burry replication, let me point out &#8230; <a href="http://dom.as/2008/04/12/replication-will-live/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=101&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Brian <a href='http://krow.livejournal.com/590912.html'>exposed</a> some of his internal letters about death of replication (caused by memcached). Back when he wrote this, I responded back a bit too. Now as quite a few people really want to burry replication, let me point out some of reasoning why it will live.<br />
<span id="more-101"></span><br />
First of all, both MySQL and memcached are slow (however you look at it, they&#8217;re both fast) &#8211; in proper gigabit environment both respond in a millisecond or so (well, MySQL is closed to 1.5ms). The major task becomes putting as much of work done in that round trip as possible.</p>
<p>Replication lag? The major problem with it was fixed by Google patches back in 4.0, finally hitting stock MySQL in 5.1. Now replication thread doesn&#8217;t get queued due to concurrency, and always enters the execution. Use binary log position serialization for reading users, and they will never notice replication lag.</p>
<p>More servers? Also more performance. Putting everything to memcached? Lots of stuff still has to be written to database. Once it is in database anyway, one can query it from database too. In low hitrate situations using memcached will be 3x slower, than just fetching data from database (get/get/set vs get). I&#8217;ve seen lots of code that was enthusiastic to use memcached, but authors didn&#8217;t actually try to profile what are the hit ratios.</p>
<p>Major problem with memcached is that it is a hash table. All it supports in data retrieval is asking for a key and getting a value. Which works great in situations where one just has a key and gets a value. Now if 50 keys are needed, memcached will need 50 lookups, quite often &#8211; routed to 50 different servers. Thats single database B-Tree read. How does one fetch all keys from 1 to 10000 with memcached? Thats right &#8211; ask for all of them. Of course, it is easy to resolve some of inefficiency by having separate memcached clusters for different tasks, appending information to multiple tracking objects, but thats where the ease of distribution starts fading, and development and administration needs surface.</p>
<p>memcached APIs now start supporting replication too &#8211; but flapping hosts can get environment out of sync quite fast then (host disappears, failover host starts getting traffic, host comes back with stale data&#8230;). Solution &#8211; object generation management, reading from multiple hosts, etc. &#8211; here again, solving simple problem already needs quite some complexity.</p>
<p>Add the ACID properties of databases, which quite often make whole development much easier &#8211; what ends up quite difficult to achieve in completely distributed &#8216;get/set&#8217; environment.</p>
<p>And by the way &#8211; memcached can be outgunned. Hot objects can be cached directly on local application server stores, like APC object cache, file system, etc. New application servers nowadays have lots of memory.. :) Need global state? Just broadcast it to all.</p>
<p>There&#8217;re much more what replicated databases can provide &#8211; more complex views, all indexed and snappy, single line change doesn&#8217;t need invalidation of hundreds or thousands of objects around, and it all comes to interactivity and serving user&#8217;s needs better. Single line change immediately visible to all the users around.</p>
<p>Brian suggests using job queue systems and pushing <i>everything</i> to memcached &#8211; which makes it a dump of stuff instead of a cache. Putting more information that might be needed ends up with unnecessary evictions, which decrease efficiency of system too. Building those objects needs reading from database (or other persistent store), and eventually they end up in database too. Surprise &#8211; they can be served from database as well! :)</p>
<p>Anyway, memcaching ideas are moving forward, so does database replication. There is lots of room for replication to evolve yet &#8211; making it more async, parallel, relaxed. Whole MySQL protocol might be better &#8211; now it is all synchronous and boring.</p>
<p>Though replication has the storage overhead &#8211; more copies are usually saved &#8211; it also allows utilizing those copies in different way, different indexing schemes, though still maintaining same image of data on different nodes. Even better, such application-specific &#8216;roles&#8217; of slaves can migrate from one node to other, heating up different segments of data.</p>
<p>The role of database replication will still remain core for scaling out reads for various workflows. Database allows incremental changes to infinite datasets required by various applications. Replication just multiples system capacity for presenting those datasets. Thats good. If it was up to me, I&#8217;d let it live.</p>
<p><i>P.S. Our current memcached cluster has 80 nodes, each providing 2gb of storage. When used properly, memcached is great tool too. :)</i></p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/domasmituzas.wordpress.com/101/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/domasmituzas.wordpress.com/101/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/101/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=101&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2008/04/12/replication-will-live/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
	</channel>
</rss>
