printing envelopes in macosx

I had to print address labels on an envelope (people still need to use snail mail sometimes). I used to be spoiled by MS Office, so I started OpenOffice and tried to do something there. I failed:

  • It didn’t print “landscape” whatever I tried to do, didn’t save settings, barf.
  • It complained that locale “” did not exist, so I couldn’t have my street name spelled correctly.
  • Ah, sure, it crashed several times, it’s 2.0 experimental Mac Intel build, so I guess it must do that, in order to show how difficult QA is. :)

So I opened simple text editor and tried to do it from there. I failed too, because didn’t understand how to transform simple text files into rotated envelope print. It was either rescaled or simply wrong somewhere else. That counts as failure too. I tried to do that from TextMate too, but it didn’t like my tries either.

So I searched the web for hints. It wasn’t a real search, because first title of first hit did tell me exactly what to do: Address Book. The solution is simple, select recipient, print. Sometimes I still am too power-oriented :)

popping a stack in mysql

Warning: this post describes practices that rely on side effects and may provide unstable results.

I was asked how to treat a table in MySQL as a stack and pop oldest entry from it. Of course there is always traditional approach – you lock table, fetch oldest record, delete it, commit. It isn’t that fun, though. This is what I came up to (and had multiple corrections):

mysql> delete from jobs where (@id:=jobs_id) limit 1; select @id;
Query OK, 1 row affected (0.00 sec)

+------+
| @id  |
+------+
| 3    |
+------+
1 row in set (0.00 sec)

mysql> delete from jobs where (@id:=jobs_id) limit 1; select @id;
Query OK, 1 row affected (0.00 sec)

+------+
| @id  |
+------+
| 4    |
+------+
1 row in set (0.00 sec)

In case of transactions used, you’d need to commit as fast as you can, as DELETE would acquire locks (but that’s the price of ACID ;-) – concurrent pops are not that easy.

Appending to stack is easy though – just.. INSERT. If you do not want transactions, you would still want InnoDB and that is where side issues kick in.

InnoDB stores all records clustered together with primary key, so table scans also happen in order of PK. If you would apply ORDER BY, there would be no guarantee that MySQL would not do filesort (if it would always pick indexes, this method would be even more awesome ;-). In case of filesort, last @a value would be set before actually sorting the data set, so a different value would be returned than deleted.

MyISAM doesn’t care at all about order, unless you specify it to. Doing the pop trick without ORDER BY would treat the stack just as pool of records and would delete any single row (and return it of course). Adding ORDER BY would set @values again to pre-filesort state.

That reminds me another trick (has been floating around for a while), that allows atomic incrementing of a field and fetching a value:

mysql> update counters set counter_value=(@val:=counter_value+1); select @val;
    -> update counters set counter_value=(@val:=counter_value+1); select @val;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
+------+
| @val |
+------+
| 562  |
+------+
1 row in set (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
+------+
| @val |
+------+
| 563  |
+------+
1 row in set (0.00 sec)

Yet another warning: As it’s modifying same row, only one transaction would be able to edit the record. So this should be done either not in transaction, or on HEAP/MyISAM/Blackhole engines.

so how does mysql generate revenue?

In order to talk in UC2006 I have to somehow enter United States. Being from East means you are treated as immigrant, unless you prove you are not. In order to prove that at consulate you have to bring income tax declarations, bank statements, lists of financial transactions and job approvals.

Oddly, for me questions at the consulate were:

  • How does MySQL AB generate revenue?
  • What is special about MySQL’s products?

Answering those questions seemed to satisfy Uncle Sam. Later I wondered, did they really know or verify, what MySQL is? Asking such questions did seem quite… educated!

So now with my fingerprints stored in databases I’ve got green light to visit US and talk about our experiences there :)

browser wars in mysql community

Um, I pasted a link in #mysql channel on freenode, and in a minute or two I had amazing analysis of browsers used by tech community:

  • around 50% guests were using Safari on MacOS X 10.4
  • one third were using Windows with 50/50 split of Firefox (1.5) and Opera (7.x)
  • Konqueror on Linux had 17% share

6 visits were analyzed, so it reflects the distribution quite accurately. Obviously IE is not used by database professionals.

mediawiki graphic profile

So I used amazing tool named KCacheGrind to parse XDebug output of MediaWiki. Of course, I immediately started refactoring some places in code. It is a random web hit, I don’t know even what it is, but it looks nasty and waiting for a gentle touch. Kill remote interwiki caches! :)

Here is what I got (triple click to zoom):

Here is what I used:

Yummy!

profiling web applications

I’ve experimented a bit with wikipedia’s existing profiling framework lately, and extracted some of bits from that. The way we do it now is having our internal profiler written in PHP, and wfProfileIn() / wfProfileOut() function calls around the code. As there’re quite a lot of profiling entry places, overhead isn’t tolerable at high loads, so it’s turned on only for every 50th page view.

One of interesting entry points is in Database::query(), we have a tiny set of regexps in Database::generalizeSQL, that convert literal arguments into ‘X’ for string and ‘N’ for integer. This way we end up with all similar queries grouped together in profiling report, showing us which queries need some love or bring down our cluster.

Profiler may have different personalities, as result can be written to database table, printed out to a user, or sent over a network packed into UDP packets. I made a special daemon for handling those messages, which is just in-memory hashtable, that is updated on every incoming UDP message and any incoming TCP connection gets full XML export. Later this can be served in human output by reporting script. It is very easy to scale such profiling system, as state preservation is not needed, and XMLs can be aggregated. I guess it was possible to do that with HEAP tables in MySQL, but writing tiny daemons is sooo fun :-)

One of plans now is to rewrite profiler class into C (that would provide ability to run large-scale all-request profiling again), and merge that with our setproctitle extension, which currently allows to see what part of PHP code is being executed:

httpd: wfMsgGetKey [enwiki]
httpd: main-cleanup [dewiki]
httpd: requestfinish
httpd: query: SELECT page_id FROM `page` WHERE page_namespace = 'X' LIMIT N  [enwiki]
httpd: query: SELECT blob_text FROM `blobs` WHERE blob_id = 'X' LIMIT N  [enwiki]

Anyway, do not think of in-code profiling as of a replacement for in-core profilers such as APD or xdebug, as these can tell much more accurately where you’re fast and where you’re slow (sometimes these results may surprise!). But what you can win with code-controlled profiler – a general view for deployed application in distributed environments, that relies not on call-tree, but rather your own defined profiling blocks.

And now we know that :

  • 20-25% of our MediaWiki execution real time is spent waiting for MySQL (3.5ms per query)
  • 16% for memcached (or actually, Tugela, my memcached+BerkeleyDB hack :)
  • 20% of requests to backend are searches, and we spend 70ms average waiting for Lucene and Mono based mwdaemon to respond
  • Saving of an article takes around 0.8s, page view is around 0.1s
  • Average times (and deviations, if someone hacks reporting :) for every query…

In most cases developers without tools do miss real performance hotspots. This is what tools are for! :-)