<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>domas mituzas &#187; innodb</title>
	<atom:link href="http://dom.as/tag/innodb/feed/" rel="self" type="application/rss+xml" />
	<link>http://dom.as</link>
	<description></description>
	<lastBuildDate>Thu, 02 Feb 2012 21:29:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='dom.as' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/6e344c6e0cd7462eb056f8b98eb2cbcd?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>domas mituzas &#187; innodb</title>
		<link>http://dom.as</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://dom.as/osd.xml" title="domas mituzas" />
	<atom:link rel='hub' href='http://dom.as/?pushpress=hub'/>
		<item>
		<title>Blowing up in memory</title>
		<link>http://dom.as/2011/09/25/blowing-up-in-memory/</link>
		<comments>http://dom.as/2011/09/25/blowing-up-in-memory/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 16:19:38 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[efficiency]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[partitions]]></category>

		<guid isPermaLink="false">http://dom.as/?p=1509</guid>
		<description><![CDATA[MySQL isn&#8217;t too concerned about table handler memory usage &#8211; it will allocate row size buffer thrice per each table invocation. There&#8217;s a few year old bug discussing UNION memory usage &#8211; for each mention in an union one can allocate &#8230; <a href="http://dom.as/2011/09/25/blowing-up-in-memory/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=1509&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>MySQL isn&#8217;t too concerned about table handler memory usage &#8211; it will allocate row size buffer thrice per each table invocation. There&#8217;s a few year <a href="http://bugs.mysql.com/bug.php?id=44626">old bug</a> discussing UNION memory usage &#8211; for each mention in an union one can allocate nearly 200k of unaccounted memory &#8211; so a megabyte sized query can consume 7GB of RAM already.</p>
<p>Partitioning though adds even more pain here &#8211; it will allocate those three buffers per each partition, so opening a table with 1000 partitions looks like this on memory profile:</p>
<p><a href="http://domasmituzas.files.wordpress.com/2011/09/partitions-memory-usage1.png"><img class="alignnone size-medium wp-image-1507" title="Partitions memory usage" src="http://domasmituzas.files.wordpress.com/2011/09/partitions-memory-usage1.png?w=300&#038;h=239" alt="" width="300" height="239" /></a></p>
<p>Click to enlarge, and you will see 191MB sent to execute a simple single-row fetching query from a table (I filed <a href="http://bugs.mysql.com/bug.php?id=62536">a bug</a> on this).</p>
<p>There&#8217;re multiple real life situations when this is painful (e.g. any kind of server stall may lead to multiple concurrent threads reading from same table, consuming additional gigabytes or tens of gigabytes of memory). It gets even more painful when combined with UNION bug &#8211; a megabyte query on an empty table can now consume 7TB of memory and I doubt anyone has that much on their MySQL servers :-)</p>
<p>P.S. Also, <a href="http://bugs.mysql.com/bug.php?id=62535">check out</a> how much memory can be wasted for malloc overhead, once discussed <a title="Wasting InnoDB memory" href="http://dom.as/2008/05/29/wasting-innodb-memory/">here</a>.<br />
P.P.S. And <a href="http://bugs.mysql.com/bug.php?id=62534">here</a> you can see why innodb_max_dirty_pages_pct=0 doesn&#8217;t do what you&#8217;d expect.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/1509/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/1509/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/1509/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/1509/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/1509/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/1509/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/1509/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/1509/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=1509&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2011/09/25/blowing-up-in-memory/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>

		<media:content url="http://domasmituzas.files.wordpress.com/2011/09/partitions-memory-usage1.png?w=300" medium="image">
			<media:title type="html">Partitions memory usage</media:title>
		</media:content>
	</item>
		<item>
		<title>InnoDB subsystems in color</title>
		<link>http://dom.as/2011/07/10/innodb-subsystems-in-color/</link>
		<comments>http://dom.as/2011/07/10/innodb-subsystems-in-color/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 19:28:28 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[internals]]></category>

		<guid isPermaLink="false">http://dom.as/?p=886</guid>
		<description><![CDATA[I tried to put every subdirectory of InnoDB codebase into a chart that would explain some of relations between subsystems and modules inside the source. This is what I got (click to enlarge): Update: Check Vadim&#8217;s diagram for a more &#8230; <a href="http://dom.as/2011/07/10/innodb-subsystems-in-color/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=886&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I tried to put every subdirectory of InnoDB codebase into a chart that would explain some of relations between subsystems and modules inside the source. This is what I got (click to enlarge):</p>
<p><a href="http://domasmituzas.files.wordpress.com/2011/07/innodb-internals.png"><img class="alignnone size-full wp-image-1174" title="InnoDB internals diagram" src="http://domasmituzas.files.wordpress.com/2011/07/innodb-internals.png?w=640&#038;h=619" alt="" width="640" height="619" /></a></p>
<p><strong>Update:</strong> Check <a href="http://www.mysqlperformanceblog.com/2010/04/26/xtradb-innodb-internals-in-drawing/">Vadim&#8217;s diagram</a> for a more operational view of InnoDB<br />
<strong>Another update:</strong> There&#8217;s a <a href="http://domasmituzas.files.wordpress.com/2011/09/innodb-internals.pdf">vector PDF version</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/886/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/886/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/886/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/886/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/886/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/886/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/886/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/886/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=886&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2011/07/10/innodb-subsystems-in-color/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>

		<media:content url="http://domasmituzas.files.wordpress.com/2011/07/innodb-internals.png" medium="image">
			<media:title type="html">InnoDB internals diagram</media:title>
		</media:content>
	</item>
		<item>
		<title>InnoDB locking makes me sad</title>
		<link>http://dom.as/2011/07/03/innodb-index-lock/</link>
		<comments>http://dom.as/2011/07/03/innodb-index-lock/#comments</comments>
		<pubDate>Sun, 03 Jul 2011 11:05:00 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[io]]></category>
		<category><![CDATA[mutex]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://dom.as/?p=873</guid>
		<description><![CDATA[Vadim and others have pointed at the index-&#62;lock problems before, but I think they didn&#8217;t good job enough at pointing out how bad it can get (the actual problematic was hidden somewhere as some odd edge case). What &#8216;index lock&#8217; &#8230; <a href="http://dom.as/2011/07/03/innodb-index-lock/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=873&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Vadim and others have <a href='http://www.mysqlperformanceblog.com/2010/02/25/index-lock-and-adaptive-search-next-two-biggest-innodb-problems/'>pointed</a> at the index-&gt;lock problems before, but I think they didn&#8217;t good job enough at pointing out how bad it can get (the actual problematic was hidden somewhere as some odd edge case). What &#8216;index lock&#8217; means is generally the fact that InnoDB has table-level locking which will kill performance on big tables miserably.</p>
<p>InnoDB is a huge pie of layers, that have various locking behaviors, and are layered on top of each other, and are structured nicely as subdirectories in your innodb_plugin directory. Low level storage interfaces are done via os/ routines, then on top of that there&#8217;s some file space manager, fsp/, which allocates space for btr/ to live in, where individual page/ entities live, with multiple row/ pieces. There&#8217;re few other subsystems around, that got quite some attention lately &#8211; e.g. buf/ pool, transaction log/, and large trx/ transactions are composed of micro transactions living in mtr/.</p>
<p>If you live in memory, you care about buffer pool and transaction log performance, if you write insane amounts of data to in-memory buffers you hit mtr/ problems and depend o how fast you can write out log/ or flush out buf/. If you are in I/O-heavy land most of stuff you care about happens in btr/.</p>
<p>Generally InnoDB is quite good about read scalability in I/O bound environments &#8211; nowadays one can saturate really fast I/O devices and there will be plenty of parallel reads done. Major scalability problem in this field was read-ahead which was funneling all read-ahead activity into a small set of threads, but other than that there can be hundreds of parallel reads issued to underlying devices. Situation changes when writes are added to the mix, though again, there&#8217;re few different scenarios.</p>
<p>There&#8217;re two ways for InnoDB to write out updates to pages, &#8220;optimistic&#8221; and &#8220;pessimistic&#8221;. Optimism here means that only in-page (page/row) operation will be needed without changing the tree structure. In one case you can expect quite high parallelism &#8211; multiple pages can be read for that operation at a time, multiple of them can be edited at a time, then some serialization will happen while writing out changes to redo log and undo segments. Expect good performance.</p>
<p>The much worse case is when B-Tree is supposed to be reorganized and multiple page operations can happen; thats pessimism. In this case whole index gets locked (via a read-write lock obtained from dict/),<br />
then B-Tree path is latched, then changes are done, then it is all unlocked until next row operation needs to hit the tree. Unfortunately, both &#8216;path is latched&#8217; and &#8216;changes are done&#8217; are expensive operations, and not only in-core, but are doing sync page read-ins, one at a time, which on busy systems serving lots of read load are supposed to be slow. Ironically, as no other operations can happen on the table at that time, you may find out you have spare I/O capacity.. ;-)</p>
<p>What gets quite interesting though is the actual operation needed to latch b-tree path. Usual wisdom would say that if you want to change a row (read-modify-write), you probably looked up the page already, so there won&#8217;t be I/O. Unfortunately, InnoDB uses an slightly more complicated binary tree version, where pages have links to neighbors, and tree latching does this (a bit simplified for reading clarity):</p>
<p><code><br />
/* x-latch also brothers from left to right */<br />
get_block = btr_block_get(space, zip_size, left_page_no, RW_X_LATCH, mtr);<br />
get_block = btr_block_get(space, zip_size, page_no, RW_X_LATCH, mtr);<br />
get_block = btr_block_get(space, zip_size, right_page_no, RW_X_LATCH, mtr);<br />
</code></p>
<p>So, essentially in this case, just because InnoDB is being pessimistic, it reads neighboring blocks to lock them, even if they may not be touched/accessed in any way &#8211; and bloats buffer pool at that time with tripple reads. It doesn&#8217;t cost much if whole tree fits in memory, but it is doing three I/Os in here, if we&#8217;re pessimistic about InnoDB being pessimistic (and I am). So, this isn&#8217;t just locking problem &#8211; it is also resource consumption problem at this stage.</p>
<p>Now, as the dictionary lock is hold in write mode, not only updates to this table stop, but reads too &#8211; think MyISAM kind of stop. Of course, this &#8216;table locking&#8217; happens at entirely different layer than MyISAM. In MyISAM it is statement-length locking whereas in InnoDB this lock is held just for row operation on single index, but if statement is doing multiple row operations it can be acquired multiple times.</p>
<p>Probably there exist decent workarounds if anyone wants to tackle this &#8211; grabbing read locks on the tree while reading pages into buffer pool, then escalating lock to exclusive. A bit bigger architectural change would be allowing to grab locks on neighbors (if they are needed) without bringing in page data into memory &#8211; but that needs InnoDB overlords to look at it. Talk to your closest MySQL vendor and ask for a fix!</p>
<p>How do regular workloads hit this? Larger your records are, more likely you are to have tree changes, lower your performance will be. In my edge case I was inserting 7k sized rows &#8211; even though my machine had multiple disks, once the dataset fell out of buffer pool, it couldn&#8217;t insert more than 50 rows a second, even though there were many disks idle and capacity gods cried. It gets worse with out-of-page blobs &#8211; then every operation is pessimistic.</p>
<p>Of course, there&#8217;re ways to work around this &#8211; usually by taking the hit of sharding/partitioning (this is where common wisdom of &#8220;large tables need to be partitioned&#8221; mostly comes from). Then, like with MyISAM, one will have multiple table locks and there may be some scalability then.</p>
<p>TL;DR: InnoDB index lock is major architectural performance flaw, and that is why you hear that large tables are slower. There&#8217;s a big chance that there&#8217;re more scalable engines for on-disk writes out there, and all the large InnoDB write/insert benchmarks were severely hit by this.</p>
<p><b>Update:</b> Filed bugs <a href='http://bugs.mysql.com/bug.php?id=61735'>#61735</a> and <a href='http://bugs.mysql.com/bug.php?id=61736'>#61736</a> with MySQL</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/873/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/873/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/873/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/873/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/873/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/873/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/873/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=873&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2011/07/03/innodb-index-lock/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>MySQL metrics for read workloads</title>
		<link>http://dom.as/2011/05/19/mysql-metrics-for-read-workloads/</link>
		<comments>http://dom.as/2011/05/19/mysql-metrics-for-read-workloads/#comments</comments>
		<pubDate>Thu, 19 May 2011 13:45:43 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[easy]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://dom.as/?p=861</guid>
		<description><![CDATA[There are multiple metrics that are really useful for read workload analysis, that should all be tracked and looked at in performance-critical environments. The most commonly used is of course Questions (or &#8216;Queries&#8217;, &#8216;COM_Select&#8217;) &#8211; this is probably primary finger-pointing &#8230; <a href="http://dom.as/2011/05/19/mysql-metrics-for-read-workloads/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=861&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There are multiple metrics that are really useful for read workload analysis, that should all be tracked and looked at in performance-critical environments.</p>
<p>The most commonly used is of course <strong>Questions</strong> (or &#8216;Queries&#8217;, &#8216;COM_Select&#8217;) &#8211; this is probably primary finger-pointing metric that can be used in communication with different departments (&#8220;why did your qps go up by 30%?&#8221;) &#8211; it doesn&#8217;t always reveal actual cost, it can be increase of actual request rates, it can be new feature, it can be fat fingers error somewhere in the code or improperly handled cache failure.</p>
<p>Another important to note is <strong>Connections</strong> &#8211; MySQL&#8217;s costly bottleneck. Though most of users won&#8217;t be approaching ~10k/s area &#8211; at that point connection pooling starts actually making sense &#8211; it is worth to check for other reasons, such as &#8220;maybe we connect when we shouldn&#8217;t&#8221;, or needlessly reconnect, or actually should start looking more at thread cache performance or pooling options. There&#8217;re some neighboring metrics like &#8216;Bytes_sent&#8217; &#8211; make sure you don&#8217;t hit 120MB/s on a gigabit network :-)</p>
<p>Other metrics usually are way more about what actually gets done. Major query efficiency signal for me for a long time used to be <b>Innodb_rows_read</b>. It is immediately pointing out when there are queries which don&#8217;t use indexes properly or are reading too much data. Gets a bit confusing if logical backup is running, but backup windows aside, this metric is probably one that is easy enough to track and understand. It has been extremely helpful to detect query plans gone wrong too &#8211; quite a few interesting edge cases could be resolved with FORCE INDEX (thats a topic for another post already :-)</p>
<p>For I/O heavy environments there&#8217;re few metrics that show mostly the same &#8211; <strong>Innodb_buffer_pool_reads</strong>, <strong>Innodb_data_reads</strong>, <strong>Innodb_pages_read</strong> &#8211; they all show how much your requests hit underlying storage &#8211; and higher increases ask for better data locality, more in-memory efficiency (smaller object sizes!) or simply more RAM/IO capacity.</p>
<p>For a long time lots of my metrics-oriented performance optimization could be summed up in this very simple ruleset:</p>
<ul>
<li>Number of rows shown to user in the UI has to be as close as possible to rows read from the index/table</li>
<li>Number of physical I/Os done to serve rows has to be as close to 0 as possible :-)</li>
</ul>
<p>Something I like to look at is the I/O queue size (both via iostat and from InnoDB&#8217;s point of view) &#8211; <strong>Innodb_data_pending_reads</strong> can tell how loaded your underlying storage is &#8211; on rotating media you can allow multiples of your disk count, on flash it can already mean something is odd. Do note, innodb_thread_concurrency can be a limiting factor here.</p>
<p>Overloads can be also detected from <strong>Threads_running</strong> &#8211; which is easy enough to track and extremely important quality of service data.</p>
<p>An interesting metric, that lately became more and more important for me is <strong>Innodb_buffer_pool_read_requests</strong>. Though it is often to use buffer pool efficiency in the ratio with &#8216;buffer pool reads&#8217;, it is actually much more interesting if compared against &#8216;Innodb_rows_read&#8217;. While Innodb_rows_read and Handler* metrics essentially show what has been delivered by InnoDB to upper SQL layer, there are certain expensive operations that are not accounted for, like <a href='http://dom.as/2011/01/27/a-case-for-force-index/'>index estimations</a>.</p>
<p>Though tracking this activity helps I/O quite a bit (right FORCE INDEX reduces the amount of data that has to be cached in memory), there can be also various edge cases that will heavily hit CPU itself. A rough example could be:</p>
<p>SELECT * FROM table WHERE parent_id=X and type IN (1,2,4,6,8,&#8230;,20) LIMIT 10;</p>
<p>If there was an index on (parent_id,type) this query would <i>look</i> efficient, but would actually do range estimations for each type in the query, even if they would not be fetched anymore. It gets worse if there&#8217;s separate (type) index &#8211; each time query would be executed, records-in-rage estimation would be done for each type in IN() list &#8211; and usually discarded, as going after id/type lookup is much more efficient.</p>
<p>By looking at Innodb_buffer_pool_read_requests we could identify optimizer inefficiency cases like this &#8211; and FORCE INDEX made certain queries 30x faster, even if we forced exactly same indexes. Unfortunately, there is no per-session or per-query metric that would do same &#8211; it could be extremely useful in sample based profiling analysis.</p>
<p>Innodb_buffer_pool_read_requests:Innodb_rows_read ratio can vary due to multiple reasons &#8211; adaptive hash efficiency, deeper B-Trees because of wide keys (each tree node access will count in), etc &#8211; so there&#8217;s no constant baseline everyone should adjust to.</p>
<p>I deliberately left out query cache (<a href='http://dom.as/tech/query-cache-tuner/'>here&#8217;s the reason</a>), or adaptive hash (I don&#8217;t fully understand performance implications there :). In <a href='https://code.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1'>mysql@facebook</a> builds we have some additional extremely useful instrumentation &#8211; wall clock seconds per various server operation types &#8211; execution, I/O, parsing, optimization, etc.</p>
<p>Of course, some people may point out that I&#8217;m writing here from a stone age, and that nowadays performance schema should be used. Maybe there will be more accurate ways to dissect workload costs, but nowadays one can spend few minutes looking at metrics mentioned above and have a decent understanding what the system is or should be doing.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/861/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/861/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/861/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/861/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/861/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/861/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/861/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=861&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2011/05/19/mysql-metrics-for-read-workloads/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>Logs memory pressure</title>
		<link>http://dom.as/2010/11/18/logs-memory-pressure/</link>
		<comments>http://dom.as/2010/11/18/logs-memory-pressure/#comments</comments>
		<pubDate>Thu, 18 Nov 2010 14:59:33 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[directio]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[io]]></category>
		<category><![CDATA[memory]]></category>

		<guid isPermaLink="false">http://dom.as/?p=818</guid>
		<description><![CDATA[Warning, this may be kernel version specific, albeit this kernel is used by many database systems Lately I&#8217;ve been working on getting more memory used by InnoDB buffer pool &#8211; besides obvious things like InnoDB memory tax there were seemingly &#8230; <a href="http://dom.as/2010/11/18/logs-memory-pressure/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=818&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><i>Warning, this may be kernel version specific, albeit this kernel is used by many database systems</i></p>
<p>Lately I&#8217;ve been working on getting more memory used by InnoDB buffer pool &#8211; besides obvious things like InnoDB <a href='http://dom.as/2008/05/29/wasting-innodb-memory/'>memory tax</a> there were seemingly external factors that were pushing out MySQL into swap (even with swappiness=0). We were working a lot on getting low hanging fruits like scripts that use too much memory, but they seem to be all somewhat gone, but MySQL has way too much memory pressure from outside.</p>
<p>I grabbed my <a href='http://dom.as/2009/06/26/uncache/'>uncache</a> utility to assist with the investigation and started uncaching various bits on two systems, one that had larger buffer pool (60G), which was already being sent to swap, and a conservatively allocated (55G) machine, both 72G boxes. Initial finds were somewhat surprising &#8211; apparently on both machines most of external-to-mysqld memory was conserved by two sets of items:</p>
<ul>
<li><b>binary logs</b> &#8211; write once, read only tail (sometimes, if MySQL I/O cache cannot satisfy) &#8211; we saw nearly 10G consumed by binlogs on conservatively allocated machines</li>
<li><b>transaction logs</b> &#8211; write many, read never (by MySQL), buffered I/O &#8211; full set of transaction logs was found in memory</li>
</ul>
<p>It was remarkably easy to get rid of binlogs from cache, both by calling out &#8216;uncache&#8217; from scripts, or using this tiny Python class:</p>
<pre>
libc = ctypes.CDLL("libc.so.6")
class cachedfile (file):
    FADV_DONTNEED = 4
    def uncache(self):
        libc.posix_fadvise(self.fileno(), 0, 0, self.FADV_DONTNEED)
</pre>
<p>As it was major memory stress source, it was somewhat a no brainer that binlogs have to be removed from cache &#8211; something that can be serially re-read is taking space away from a buffer pool which avoids random reads. It may make sense to call posix_fadvise() right after writes to them, even.</p>
<p>Transaction logs, on the other hand, are entirely different beast. From MySQL perspective they should be uncached immediately, as nobody ever ever reads them (crash recovery aside, but re-reading then is relatively cheap, as no writes or random reads are done during log read phase). Unfortunately, the problem lies way below MySQL, and thanks to PeterZ for reminding me (we had a small chat about this at Jeremy&#8217;s <a href='http://www.meetup.com/mysql-silicon-valley/'>Silicon Valley MySQL Meetup</a>).</p>
<p>MySQL transaction records are stored in multiple log groups per transaction, then written out as per-log-group writes (each is in multiple of 512 bytes), followed by fsync(). This allows FS to do transaction log write as single I/O operation. This also means that it will be doing partial page writes to buffered files &#8211; overwriting existing data in part of the page, so it has to be read from storage.</p>
<p>So, if all transaction log pages are removed from cache, quite some of them will have to be read back in (depending on sizes of transactions, probably all of them in some cases). Oddly enough, when I tried to hit the edge case, single thread transactions-per-second remained same, but I saw consistent read I/O traffic on disks. So, this would probably work on systems, that have spare I/O (e.g. flash based ones).</p>
<p>Of course, as writes are already in multiples of 512 (and appears that memory got allocated just fine), I could try out direct I/O &#8211; it should avoid page read-in problem and not cause any memory pressure by itself. In this case switching InnoDB to use O_DIRECT was a bit dirtier &#8211; one needs to edit source code and rebuild the server, restart, etc, or&#8230;<br />
<code><br />
# lsof ib_logfile*<br />
# gdb -p $(pidof mysqld)<br />
(gdb) call os_file_set_nocache(9, "test", "test")<br />
(gdb) call os_file_set_nocache(10, "test", "test")<br />
</code><br />
I did not remove fsync() call, but as it is somewhat noop on O_DIRECT files, I left it there, probably it would change benchmark results, but not much.</p>
<p>Some observations:</p>
<ul>
<li>O_DIRECT was ~10% faster at best case scenario &#8211; lots of tiny transactions in single thread</li>
<li>If group commit is used (without binlogs), InnoDB can have way more transactions with multiple threads using buffered I/O, as it does multiple writes per fsync</li>
<li>Enabling sync_binlog makes the difference not that big &#8211; even with many parallel writes direct writes are 10-20% slower than buffered ones</li>
<li>Same for innodb_flush_log_on_trx_commit0 &#8211; multiple writes per fsync are much more efficient with buffered I/O</li>
<li>One would need to do log group merge to have more efficient O_DIRECT for larger transactions</li>
<li>O_DIRECT does not have theoretical disadvantage, current deficiencies are just implementation oriented at buffered I/O &#8211; and can be resolved by (in same areas &#8211; extensive) engineering</li>
<li>YMMV. In certain cases it definitely makes sense even right now, in some other &#8211; not so much</li>
</ul>
<p>So, the outcome here depends on many variables &#8211; with flash read-on-write is not as expensive, especially if read-ahead works. With disks one has to see what is better use for the memory &#8211; using it for buffer pool reduces amount of data reads, but causes log reads. And of course, O_DIRECT wins in the long run :-)</p>
<p>With this data moved away from cache and InnoDB memory tax reduced one could switch from using 75 % of memory to 90% or even 95% for InnoDB buffer pools. Yay?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/818/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/818/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/818/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/818/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/818/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/818/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/818/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/818/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=818&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2010/11/18/logs-memory-pressure/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>random poking</title>
		<link>http://dom.as/2010/11/08/random-poking/</link>
		<comments>http://dom.as/2010/11/08/random-poking/#comments</comments>
		<pubDate>Mon, 08 Nov 2010 04:55:26 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[sysbench]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=811</guid>
		<description><![CDATA[These are some of my notes from some sysbench in-memory r/o testing in past day or so: At &#8216;fetch data by primary key&#8217; benchmark with separate read snapshots at each statement, MySQL shines until ~200 concurrent threads, then performance starts &#8230; <a href="http://dom.as/2010/11/08/random-poking/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=811&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>These are some of my notes from some sysbench in-memory r/o testing in past day or so:</p>
<ul>
<li>At &#8216;fetch data by primary key&#8217; benchmark with separate read snapshots at each statement, MySQL <a href='https://spreadsheets.google.com/pub?key=0AtHDNfVx0WNhdG13QkgwUXZITTM2UGtjRHU1VkRVbkE&amp;hl=en&amp;gid=4'>shines</a> until ~200 concurrent threads, then performance starts dropping slightly faster than one would want, I think mostly from <a href='http://bugs.mysql.com/bug.php?id=58037'>table cache LOCK_open contention</a></li>
<li>auto-commit cost (establishing read snapshot per statement) for SELECTs is ~10% for MySQL, but for PG it can be +50% in plain SQL mode and +130% (!!!!!!!) when using prepared statements (this can be seen in a <a href='https://spreadsheets.google.com/pub?key=0AtHDNfVx0WNhdG13QkgwUXZITTM2UGtjRHU1VkRVbkE&amp;hl=en&amp;gid=5'>graph</a> &#8211; obviously the global lock PG has during this operation is held for too long and maybe is too costly to acquire.)</li>
<li>Some benchmarks went up by 10% when using jemalloc</li>
<li>MySQL could accept 10x more connections per second than PG (15000 vs 1500)</li>
<li>Most confusing behavior MySQL exhibited was at 100-record range scans in PK order:
<ul>
<li>At innodb_thread_concurrency=0 it did around 70k range reads, both fetching data and aggregation (SUM())</li>
<li>At innodb_thread_concurrency&gt;0 it did only 10k range reads returning data but still was able to do 70k aggregations/s</li>
<li>PG was doing ~35k ops/s at that test</li>
</ul>
<p>It seems that at least for systems that do lots of range scans (or joins) I guess, managed concurrency kills performance entirely due to giving up tickets too often, need to review it more (Update: it seems that offending stack is ha_release_temporary_latches being called way too early in the select_send::send_data()).
</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/811/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/811/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/811/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/811/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/811/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/811/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/811/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/811/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=811&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2010/11/08/random-poking/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>on performance stalls</title>
		<link>http://dom.as/2010/09/22/on-performance-stalls/</link>
		<comments>http://dom.as/2010/09/22/on-performance-stalls/#comments</comments>
		<pubDate>Wed, 22 Sep 2010 13:22:43 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[io]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[pmp]]></category>
		<category><![CDATA[stalls]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=795</guid>
		<description><![CDATA[We quite often say, that benchmark performance is usually different from real world performance &#8211; so performance engineering usually has to cover both &#8211; benchmarks allow to understand sustained performance bottlenecks, and real world analysis usually concentrates on something what &#8230; <a href="http://dom.as/2010/09/22/on-performance-stalls/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=795&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We quite often say, that benchmark performance is usually different from real world performance &#8211; so performance engineering usually has to cover both &#8211; benchmarks allow to understand sustained performance bottlenecks, and real world analysis usually concentrates on something what would be considered &#8216;exceptional&#8217; and not important in benchmarks &#8211; stalls of various kind. They are extremely important, as the state when our performance is lowest is the state of performance we provide to our platform users.</p>
<p>On a machine that is doing 5000qps, stalling for 100ms means that 500 queries were not served as fast as they could, or even hit application timeouts or exceptional MySQL conditions (like <a href='http://bugs.mysql.com/bug.php?id=26590'>1023</a> transaction limit). Of course, stalling for a second means 5000 queries were not served in time&#8230;</p>
<p>We have multiple methods to approach this &#8211; one is our &#8216;dogpiled&#8217; framework &#8211; an agent doing status polling every second and reporting information about I/O state, MySQL/InnoDB statuses, processlists, etc &#8211; so we see the scope of stalls in our environment. We try to maintain the threshold between complete information overload and something that reveals problems &#8211; so it is always balancing act, especially with great work done by engineering team :)</p>
<p>Other approach, usually led to by dogpiles information, is auto-<a href='http://poormansprofiler.org'>PMP</a> &#8211; high-frequency status polling combined with gdb invocations, that allow us to jump into the process whenever we notice something weird is going on. We have some extensions to how we use PMP &#8211; but thats worth another post.</p>
<p>Issues we do find out that harm us most in production environments are ones that are quite often discarded as either &#8220;this never happens&#8221; or &#8220;get better hardware&#8221; or &#8220;your application is wrong&#8221;. Unfortunately, that happens, we do have thousands of machines that aren&#8217;t free and our application demands are our application demands :)</p>
<p>Few examples:</p>
<ul>
<li><a href='http://bugs.mysql.com/bug.php?id=56696'>TRUNCATE stalls the server</a> (oh well, <a href='http://bugs.mysql.com/bug.php?id=41158'>DROP TABLE too</a>) &#8211; in this case, truncating a table grabs dictionary mutex, other transaction blocks while holding LOCK_open, everything else stops. Though truncating is supposed to be fast operation, it has to unlink (delete) a file, and with large files such operation isn&#8217;t really instant on any filesystem. Even if one deletes all the data before truncating, file is still on the filesystem.</li>
<li><a href='http://bugs.mysql.com/bug.php?id=56433'>Extending data files stalls the server</a> &#8211; when a data file is being extended, global mutex is held, which blocks all I/Os (with limited concurrency that is full server stall). Somewhat more impressive with file-per-table. This is the major reason for mini-stalls at the moment &#8211; on machines that grow at gigabytes-a-day rate this is being hit quite often.</li>
<li><a href='http://bugs.mysql.com/bug.php?id=56340'>Updating table statistics stalls the server</a> &#8211; we hit this with high-performance task tracking machines, row churn there is quite amazing, and dictionary statistics are reread more often than one would expect. Updating statistics means locking the table while doing random reads from disk. Once major workload is hitting that table, it quickly escalates to full server stall</li>
<li><a href='http://bugs.mysql.com/bug.php?id=55004'>Fuzzy checkpoint stalls the server</a> &#8211; this is one of biggest issues outstanding in stock MySQL &#8211; though one would expect that &#8220;fuzzy checkpoint&#8221; that uses async background threads is nonblocking, actually all writes during it will stall, taking all concurrency slots and leading to a server stall. Mark&#8217;s fix was just doing this work in background thread.</li>
<li>(no bug filed on this yet) &#8211; Purge stalls the server &#8211; purge holds dictionary lock while doing random reads from disk, with table stall leading to server stall.</li>
</ul>
<p>There&#8217;re more issues (mostly related to heavier in-memory activities of the server), but these ones are most obvious ones &#8211; where single I/O request done is escalated to table or instance lockup, where no other work is done. Our machines have multiple disks, multiple CPUs and can support multiple SQL queries being executed at once, so any of these lockups effectively limit our available performance or damage the quality of service we can provide.</p>
<p>On the upside, my colleagues are absolutely amazing and I&#8217;m sure that we will have all these issues fixed in our deployment in near future, as well as everyone will be able to pick that up via <a href='https://code.launchpad.net/mysqlatfacebook'>mysqlatfacebook</a> branch.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/795/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=795&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2010/09/22/on-performance-stalls/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>Read ahead&#8230;</title>
		<link>http://dom.as/2010/01/02/read-ahead/</link>
		<comments>http://dom.as/2010/01/02/read-ahead/#comments</comments>
		<pubDate>Sat, 02 Jan 2010 13:00:43 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[gdb]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[io]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=720</guid>
		<description><![CDATA[Mark wrote about how to find situations where InnoDB read-ahead is a bottleneck. What he didn&#8217;t disclose, though, is his trick to disable read-ahead without restart or recompile of MySQL. See, there&#8217;s no internal &#8220;disable read ahead knob&#8221;. But there &#8230; <a href="http://dom.as/2010/01/02/read-ahead/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=720&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Mark <a href="http://www.facebook.com/notes/mysqlfacebook/when-doesnt-innodb-readahead-work/198536580932">wrote</a> about how to find situations where InnoDB read-ahead is a bottleneck. What he didn&#8217;t disclose, though, is his trick to disable read-ahead without restart or recompile of MySQL. See, there&#8217;s no internal &#8220;disable read ahead knob&#8221;. But there is&#8230;</p>
<pre>buf_read_ahead_random(...){ ...
       if (srv_startup_is_before_trx_rollback_phase) {
                /* No read-ahead to avoid thread deadlocks */
                return(0);
        }</pre>
<p>This variable is tested at two functions &#8211; buf_read_ahead_linear() and buf_read_ahead_random() and <strong>nowhere else</strong>. So yeah, &#8220;server startup is before transaction rollback phase&#8221; is another way of saying &#8220;don&#8217;t do read ahead, please please&#8221;.</p>
<pre>gdb -ex "set  srv_startup_is_before_trx_rollback_phase=1" \
    --batch -p $(pidof mysqld)</pre>
<p>And many servers bottlenecked on this became much much much faster (and 2000 concurrent threads running dropped to 10). Of course, this is most visible in high-latency-high-throughput I/O situations, but we&#8217;re hitting this contention spot on local disk setups too.</p>
<p>Don&#8217;t forget to have <a href="http://dom.as/2009/12/29/when-bad-things-happen/">the fix</a> if gdb decides to be nasty and locks up your server :)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/720/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/720/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/720/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/720/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/720/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/720/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/720/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/720/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=720&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2010/01/02/read-ahead/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>Opening tables!</title>
		<link>http://dom.as/2009/12/26/opening-tables/</link>
		<comments>http://dom.as/2009/12/26/opening-tables/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 00:30:46 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[mutex]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=702</guid>
		<description><![CDATA[There&#8217;s one bottleneck in MySQL/InnoDB that ultimately sucks. It sucked in 4.0, sucked in 5.0, sucks in 5.1 with newest InnoDB plugin. Opening tables has been a bottleneck on machines that have thousands of tables all the time (as LOCK_open &#8230; <a href="http://dom.as/2009/12/26/opening-tables/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=702&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s one bottleneck in MySQL/InnoDB that ultimately sucks. It sucked in 4.0, sucked in 5.0, sucks in 5.1 with newest InnoDB plugin. Opening tables has been a bottleneck on machines that have thousands of tables all the time (as LOCK_open is being held during the process), and while there was a table being opened, everything else would stall on the machine.</p>
<p>It can simply take hours on such systems just to open tables &#8211; and the major portion of time spent is randomly diving into InnoDB tables to populate index statistics. It obviously sounds like low hanging fruit &#8211; as statistics aren&#8217;t needed while you are opening a table, they&#8217;re needed just for querying the table.</p>
<p>So, I threw in few thousand tables to my machine, and tried opening them with ten connections. Standard InnoDB code was opening 13.5 tables a second. After spending few minutes and <a href="http://p.defau.lt/?7z_wf6C9Xk0b0XFyD9AaeA">moving</a> (this is pure prototype, not suitable for production) statistic collection post ha_innodb::open(), I noticed performance increase.</p>
<p>Tables were opened at 105-a-second speed. A bit better, ~8x better.</p>
<p>Merry Christmas, MySQL!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/702/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/702/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/702/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/702/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/702/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/702/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/702/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/702/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=702&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2009/12/26/opening-tables/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>
	</item>
		<item>
		<title>On deadlock detection</title>
		<link>http://dom.as/2009/12/21/on-deadlock-detection/</link>
		<comments>http://dom.as/2009/12/21/on-deadlock-detection/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 00:41:21 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[mysql]]></category>
		<category><![CDATA[deadlock]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[mutex]]></category>

		<guid isPermaLink="false">http://mituzas.lt/?p=698</guid>
		<description><![CDATA[InnoDB detects deadlocks. Deadlocks are those nasty situations, when transaction 1 tries to acquire locks A and B, whereas transaction 2 tries to acquire locks B and A at the same time. As both are stubborn, InnoDB will decide simply &#8230; <a href="http://dom.as/2009/12/21/on-deadlock-detection/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=698&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>InnoDB detects deadlocks. Deadlocks are those nasty situations, when transaction 1 tries to acquire locks A and B, whereas transaction 2 tries to acquire locks B and A at the same time. As both are stubborn, InnoDB will decide simply to terminate one of them. If it wouldn&#8217;t do that, both transactions would have to wait until lock_wait_timeout to expire otherwise. There is a big chance that longer the transaction is, more likely it is to cause deadlocks. Deadlock detection kind of helps, then, but&#8230; at certain costs.</p>
<p>Transaction 1 and 2 case is way too easy, try adding few hundred transactions that contend over same set of locks. To do that, InnoDB deadlock monitor will recursively brute-force lock graph, until it hits a 200-transaction-long chain (it will say it is a deadlock), or until it runs out of paths to check. Still, with the power of modern hardware that will still be milliseconds.</p>
<p>Unfortunately, InnoDB will also hold kernel_mutex at that time, so lots and lots of InnoDB operations will not happen at that time. To be exact, InnoDB will rarely do anything else, while deadlock check is happening.</p>
<p>To illustrate that, I have a very simple testcase (that in certain conditions stalls the server for half an hour, even if it is not being ran):</p>
<blockquote><p>UPDATE t1 SET b=b+1 WHERE a=1;</p></blockquote>
<p>With few threads it executes nearly 20000 times a second on my desktop machine. With ten threads it executes 14000/s. With 50 threads it is only 3000/s. With 100 threads it falls down to 639 operations a second. At 140 threads it is already just 266.</p>
<p>I built InnoDB without deadlock detection (<a href="http://p.defau.lt/?GF5_toN1gdzdFQCLK1Dg_w">tiny tiny patch</a>), and tried same test. Similar performance with 10 threads, still doing 10000 operations a second at 100 threads:<br />
<img src="http://spreadsheets.google.com/oimg?key=0AtHDNfVx0WNhdDA3dmtjOVItZHFKZjRGMFBpd0JUaUE&amp;oid=1&amp;v=1261348373043" /></p>
<p>Though I illustrated edge case here, its purity actually didn&#8217;t show how bad this can go &#8211; this situation can happen not only because of high contention on single row, but simply because someone holds up the row lock for a bit too long (there&#8217;s always that sleep between UPDATE and COMMIT, too). It can take a single transaction to cause a lock convoy, and once transactions queue up, and update rate falls down below 100/s, all MySQL will be doing is checking for deadlocks, even if they never happen.</p>
<p>On many systems deadlock detection is causing way more issues, than lack of it would. Most deadlocks happen on transactions that are somewhere in the middle of their lock wait anyway :)</p>
<p>There&#8217;s some discussion about it at MySQL <a href="http://bugs.mysql.com/bug.php?id=49047">Bug#49047</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/698/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=698&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dom.as/2009/12/21/on-deadlock-detection/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c660a6eb3a4005232acb111303bef12c?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">domasmituzas</media:title>
		</media:content>

		<media:content url="http://spreadsheets.google.com/oimg?key=0AtHDNfVx0WNhdDA3dmtjOVItZHFKZjRGMFBpd0JUaUE&#38;oid=1&#38;v=1261348373043" medium="image" />
	</item>
	</channel>
</rss>
