popping a stack in mysql

Warning: this post describes practices that rely on side effects and may provide unstable results.

I was asked how to treat a table in MySQL as a stack and pop oldest entry from it. Of course there is always traditional approach – you lock table, fetch oldest record, delete it, commit. It isn’t that fun, though. This is what I came up to (and had multiple corrections):

mysql> delete from jobs where (@id:=jobs_id) limit 1; select @id;
Query OK, 1 row affected (0.00 sec)

+------+
| @id  |
+------+
| 3    |
+------+
1 row in set (0.00 sec)

mysql> delete from jobs where (@id:=jobs_id) limit 1; select @id;
Query OK, 1 row affected (0.00 sec)

+------+
| @id  |
+------+
| 4    |
+------+
1 row in set (0.00 sec)

In case of transactions used, you’d need to commit as fast as you can, as DELETE would acquire locks (but that’s the price of ACID ;-) – concurrent pops are not that easy.

Appending to stack is easy though – just.. INSERT. If you do not want transactions, you would still want InnoDB and that is where side issues kick in.

InnoDB stores all records clustered together with primary key, so table scans also happen in order of PK. If you would apply ORDER BY, there would be no guarantee that MySQL would not do filesort (if it would always pick indexes, this method would be even more awesome ;-). In case of filesort, last @a value would be set before actually sorting the data set, so a different value would be returned than deleted.

MyISAM doesn’t care at all about order, unless you specify it to. Doing the pop trick without ORDER BY would treat the stack just as pool of records and would delete any single row (and return it of course). Adding ORDER BY would set @values again to pre-filesort state.

That reminds me another trick (has been floating around for a while), that allows atomic incrementing of a field and fetching a value:

mysql> update counters set counter_value=(@val:=counter_value+1); select @val;
    -> update counters set counter_value=(@val:=counter_value+1); select @val;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
+------+
| @val |
+------+
| 562  |
+------+
1 row in set (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
+------+
| @val |
+------+
| 563  |
+------+
1 row in set (0.00 sec)

Yet another warning: As it’s modifying same row, only one transaction would be able to edit the record. So this should be done either not in transaction, or on HEAP/MyISAM/Blackhole engines.

so how does mysql generate revenue?

In order to talk in UC2006 I have to somehow enter United States. Being from East means you are treated as immigrant, unless you prove you are not. In order to prove that at consulate you have to bring income tax declarations, bank statements, lists of financial transactions and job approvals.

Oddly, for me questions at the consulate were:

  • How does MySQL AB generate revenue?
  • What is special about MySQL’s products?

Answering those questions seemed to satisfy Uncle Sam. Later I wondered, did they really know or verify, what MySQL is? Asking such questions did seem quite… educated!

So now with my fingerprints stored in databases I’ve got green light to visit US and talk about our experiences there :)

browser wars in mysql community

Um, I pasted a link in #mysql channel on freenode, and in a minute or two I had amazing analysis of browsers used by tech community:

  • around 50% guests were using Safari on MacOS X 10.4
  • one third were using Windows with 50/50 split of Firefox (1.5) and Opera (7.x)
  • Konqueror on Linux had 17% share

6 visits were analyzed, so it reflects the distribution quite accurately. Obviously IE is not used by database professionals.

profiling web applications

I’ve experimented a bit with wikipedia’s existing profiling framework lately, and extracted some of bits from that. The way we do it now is having our internal profiler written in PHP, and wfProfileIn() / wfProfileOut() function calls around the code. As there’re quite a lot of profiling entry places, overhead isn’t tolerable at high loads, so it’s turned on only for every 50th page view.

One of interesting entry points is in Database::query(), we have a tiny set of regexps in Database::generalizeSQL, that convert literal arguments into ‘X’ for string and ‘N’ for integer. This way we end up with all similar queries grouped together in profiling report, showing us which queries need some love or bring down our cluster.

Profiler may have different personalities, as result can be written to database table, printed out to a user, or sent over a network packed into UDP packets. I made a special daemon for handling those messages, which is just in-memory hashtable, that is updated on every incoming UDP message and any incoming TCP connection gets full XML export. Later this can be served in human output by reporting script. It is very easy to scale such profiling system, as state preservation is not needed, and XMLs can be aggregated. I guess it was possible to do that with HEAP tables in MySQL, but writing tiny daemons is sooo fun :-)

One of plans now is to rewrite profiler class into C (that would provide ability to run large-scale all-request profiling again), and merge that with our setproctitle extension, which currently allows to see what part of PHP code is being executed:

httpd: wfMsgGetKey [enwiki]
httpd: main-cleanup [dewiki]
httpd: requestfinish
httpd: query: SELECT page_id FROM `page` WHERE page_namespace = 'X' LIMIT N  [enwiki]
httpd: query: SELECT blob_text FROM `blobs` WHERE blob_id = 'X' LIMIT N  [enwiki]

Anyway, do not think of in-code profiling as of a replacement for in-core profilers such as APD or xdebug, as these can tell much more accurately where you’re fast and where you’re slow (sometimes these results may surprise!). But what you can win with code-controlled profiler – a general view for deployed application in distributed environments, that relies not on call-tree, but rather your own defined profiling blocks.

And now we know that :

  • 20-25% of our MediaWiki execution real time is spent waiting for MySQL (3.5ms per query)
  • 16% for memcached (or actually, Tugela, my memcached+BerkeleyDB hack :)
  • 20% of requests to backend are searches, and we spend 70ms average waiting for Lucene and Mono based mwdaemon to respond
  • Saving of an article takes around 0.8s, page view is around 0.1s
  • Average times (and deviations, if someone hacks reporting :) for every query…

In most cases developers without tools do miss real performance hotspots. This is what tools are for! :-)

MySQL 5.0 optimizer: loose scans

MySQL 5.0 among lots of visible features, introduced several neat optimizer improvements, that may give surprising performance in some queries. Loose index scan, or rather index jumping, allows fast aggregate max() or min() operations, as well as distinct row queries.

Let’s have such table:

+------+------+
| a    | b    |
+------+------+
|    1 |    5 |
|    1 |    3 |
|    2 |    4 |
|    2 |    2 |
|    1 |    1 |
+------+------+

A SELECT MIN(b),MAX(b) FROM ournicelittletable GROUP BY a might be executed in different ways. Usually aggregate functions need to build a temporary table in order to sort results, or use index for sorting.

In case of temporary table, aggregation is done by reading all rows from dataset and updating in-memory grouped results. With our tiny sample dataset there would be table created with primary key ‘a’, as well as min() and max() columns. Then a table scan would happen, which would update aggregated values on every in-memory row with bigger for max() or smaller for min() b value found. After full scan is done, this table could be returned directly to user.

If there is an index on fields (a,b), then tight index scan may happen – it would read values in such order: (1,1),(1,3),(1,5),(2,2),(2,4). In this kind of execution engine does not need to build any kind of temporary tables, once it reads all rows with same ‘a’ value, it can return a minimums or maximums to the user, then continue working on other ‘a’ values.

If MySQL 5.0 would find an index on (a,b), it would simply go to each a values, read one row for min(), one row for max(), as that information is already in index tree, and immediately return that to user.

I built a bigger table, with >200k rows, with unique field a, and added 233 groups, specified in b, with 1000 values in c each. One group didn’t have 1000 values, so I tried to find which one:

mysql> select b,max(c),min(c) from tttable group by b  having max(c)<1000;
+------+--------+--------+
| b    | max(c) | min(c) |
+------+--------+--------+
|  233 |     83 |      1 |
+------+--------+--------+
1 row in set (0.01 sec) (no query cache :)

Then I tried same on different engine, which didn’t support loose scans, and query took 0.2s. I repeated executions on both engines, so that data would be cached by OS, but speed difference was quite noticable. Of course, we could check what was done in there. The 0.01s query was executed this way:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tttable
         type: range
possible_keys: NULL
          key: b_2
      key_len: 5
          ref: NULL
         rows: 471
        Extra: Using index for group-by

Please note the ‘Extra’ line, and how many rows were read. As intended, two rows from each group, one for min() and one for max().
Now the 0.2s query did tell me that it was reading 230k rows. It did actually decide that in-memory table would be faster than reading index and did scan whole table. Another operation I tried was SELECT DISTINCT b FROM tttable. MySQL 5.0 executed the query in 0.01s, though the other engine was again more than 15 times slower, just because it had to deal with more rows.

Conclusion: loose index scans help. Take a look at the manual.

syslog in mysql function

I experimented with sending UDP messages from MySQL functions, but ended up with simple syslogging UDF

I played a bit with various interaction with outer world ideas in MySQL UDFs and ended up with something what was really really simple – a single libc call. I did shamelessly steal bare bones and simply added a single line of code in it:

#include <mysql.h>
#include <string.h>
#include <syslog.h>

my_bool logger_init(UDF_INIT *initid, UDF_ARGS *args, char *message) {
        initid->maybe_null=0;
        return 0;
}

long long logger(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) {
        if (args->arg_count != 1) {
                strcpy(error, "LOGGER(): needs message");
                return 1;
        }

        if (args->arg_type[0] != STRING_RESULT) {
                strcpy(error, "LOGGER() message should be string");
                return 1;
        }

        syslog(LOG_INFO,"%s",args->args[0]);
        *is_null = 0;
        *error = 0;

        return 0;
}

Of course, I had to compile it:

gcc -I /usr/include/mysql/ -shared -o syslogudf.so syslogudf.c

And later load it:

mysql> create function logger returns integer soname 'syslogudf.so';
Query OK, 0 rows affected (0.00 sec)

One more thing… testing:

mysql> select logger(concat(user()," wishes you ",
    -> if(rand()>0.3,"good","bad")," luck"));
+---------------------------------------------------------------------------+
| logger(concat(user()," wishes you ",if(rand()>0.3,"good","bad")," luck")) |
+---------------------------------------------------------------------------+
|                                                                         0 |
+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

So now our systems administrator will see:

$ tail -1 /var/log/messages
Dec 12 01:09:22 flake mysqld-max: root@localhost wishes you bad luck

Oops!

ip address searches in mysql

MySQL does not provide special datatype or operators for IP addresses, but has two functions, that convert IP addresses to and from integers. In order to lookup addresses in a subnet, an efficient index range scan queries could be used.

MySQL does not provide special datatype or operators for IP addresses, but has two functions, that convert IP addresses to and from integers. In order to lookup addresses in a subnet, an efficient index range scan query could be used:

SELECT ip FROM table WHERE ip BETWEEN
	INET_NTOA("213.197.137.0") AND
	INET_NTOA("213.197.137.255")

In order to make it easier, we can have several handy functions, that simplify the task a bit:

SELECT ip FROM table WHERE ip BETWEEN
	inet_aton_net("213.197.137.3/24") AND
	inet_aton_bc("213.197.137.3/24")

I intentionally didn’t create/use function that would check IPs existence in network, as optimizer wouldn’t know it can be converted to efficient range scan, though, some operators like… LIKE are optimized properly.

CREATE FUNCTION inet_aton_net (ip VARCHAR(18))
	RETURNS INTEGER UNSIGNED
	DETERMINISTIC
BEGIN
	DECLARE nm INTEGER UNSIGNED DEFAULT 0;
	SET nm=32-SUBSTRING(ip,LOCATE("/",ip)+1);
RETURN
	-- shift right and left looses host bits
	INET_ATON(LEFT(ip,LOCATE("/",ip)-1)) >> nm << nm;
END
CREATE FUNCTION inet_aton_bc (ip VARCHAR(18))
	RETURNS INTEGER UNSIGNED
	DETERMINISTIC
BEGIN
	DECLARE nm INTEGER UNSIGNED DEFAULT 0;
	SET nm=SUBSTRING(ip,LOCATE("/",ip)+1);
RETURN
	-- ip ORed with inverse netmask provides broadcast address
	INET_ATON(LEFT(ip,LOCATE("/",ip)-1)) | (4294967295>>nm);
END

yet another hype – wordpress

Yet again I couldn’t resist to a hype and ran personal WordPress deployment.

This time I guess I’m running away from LiveJournal, where blogs may be actually read and start my own corner, where I will be safely to spam the net, without obligations to any external hosting provider :)

In order not to make this post absolutely useless, I’ll pay tribute to Jan and his amazing lighty. This is how my web server is configured to handle all rewrite magic:

$HTTP["host"] == "dammit.lt" {
        $HTTP["url"] =~ "^/flow/" {
                server.error-handler-404 = "/flow/index.php?error=404"
        }
}

Too easy, isn’t it? :)

Update: this blog now lives on wordpress.com

%d bloggers like this: