Not reservation, order code!

Italians sometimes may be damn frustrating. This time I ordered an international train ticket online, then went to train station (Milano Centrale) to get it (may sound stupid, but they seem to require all those reservations and e-tickets still not available).

Of course, the guy at “international tickets” counter didn’t actually speak English (I sure know Italian is most important language, at least in operas). Then… it appears I didn’t read the email properly.

The email their automated system sent me did have two sections. One of them was Order details, which contained ‘Booking code (DOSSIER)’. Another was ‘Ticket details’, which contained a reservation number. Of course, I had wrong code. I know it was frustrating for him too, as the intranet application responded to any clicks in about a minute, so he had a really important reason not to try to help me. I already mentioned he did not speak English.

So, after one hour and two metro trips I presented booking/order/dossier code instead of reservation number, and got my tickets. Oh, and by the way, international ticket office starts working at 7:30, train leaves at 7:10. I am all happy I had this story this afternoon, not the morning of departure.

Update: Well, now I feel stupid and outdated. I could have opened that email in my mobile phone. :-)

LAMPS on steroids

I’m not sure if I’m the first coining in ‘LAMPS’ – scaled out LAMP environment with Squid in front, but it sounds cool. Squid is major component in content distribution systems, reducing the load from all the backend systems dramatically (especially with proper caching rules). We had various issues in past, where we used code nobody else seemed to be using – cache coordination, purges and of course, load.

Quite a few problems resulted in memory leaks, but one was particularly nasty: Squid processes under high load started leaking CPU cycles somewhere. After deploying profiling for squid we actually ended up seeing that the problem is inside libc. Once we started profiling libc, one of initial assumptions appeared to be true – our heap was awfully fragmented, slowing down malloc().

Here comes our steroids part: Google has developed a drop-in malloc replacement, tcmalloc, that is really efficient. Space efficient, cpu efficient, lock efficient. This is probably mostly used (and sophisticated) libc function, that was suffering performance issues not that many people wanted to actually tackle. The description sounded really nice, so we ended up using it for our suffering Squids.

The results were what we expected – awesome :) Now the nice part is that the library is optimized for multi-threaded applications, doing lots of allocations for small objects without too much of lock contention, and uses spinlocks for large allocations. MySQL exactly fits the definition, so just by using simple drop-in replacement you may achieve increased performance over standard libc implementations.

For any developers working on high-performance applications, Google performance tools provide easy ways to access information that was PITA to work on before. Another interesting toy they have is embedded http server providing run-time profiling info. I’m already wondering if we’d should combine that with our profiling framework. Yummy. Steroids.

TCP tuning for your database

Lately lots of new fascinating technologies are used to build even more fascinating new solutions, and solutions nowadays even run on distributed environments, not just on single server. These servers usually communicate using TCP – standard, that has been here long before gigabit (or ten megabit) ethernet or dawn of LAMP, and needs to be kicked a bit, to work properly. In our environment we have over hundred of application servers which handle quite a number of requests, and put quite demanding stress on database servers. For that we had to change some of default kernel settings – and it did improve situation a bit. Here I’ll try to overview some of them…
Continue reading “TCP tuning for your database”

Open transactions are bad

Two identical queries have somewhat different execution times, though data layout, table format, everything… is quite same:

mysql> select count(*) from transtest;
+----------+
| count(*) |
+----------+
|    10000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from transtest;
+----------+
| count(*) |
+----------+
|    10000 |
+----------+
1 row in set (2.88 sec) < ---- oh noes!!!

The difference is very simple, though often forgotten. Multiversioned (MVCC) tables may have multiple versions (hence the name) of rows.
Even more, the engine will keep all versions of rows, that would be needed for oldest active transaction, just in case it requests some snapshot from the past.
Think of it as of some sort of time machine, just somewhat less efficient :)

If some old transaction is running, heavily edited tables will have more stored row versions.
In above case, second query had to read a million of rows to produce a count, when first one was working on clean environment.
Once the transaction holding old versions was rolled back, the query took 0.00sec again.

Databases (or DataACIDs) will take additional measures, often unexposed, to provide consistent data, so we may treat this as a feature, not a bug.

Conclusion: avoid long running transactions, especially if doing nothing. Oh well, avoid long running transactions with locks too, but that is already different question. If you spot anything sleeping – kill the slacker. At application level. :-)

Wikimaniaaaaa!!!

It is somewhat too late to post about Wikimania, so I won’t keep it long. Finally my lightning talk bits are on internet. Please note, that the idea of presentation came just few minutes before giving it. Pity there’s no video recording.

Anyway, the trip was pretty cool. Besides meeting all usual suspects at Wikimania, the whole world of amazing academia (this time, in a good sense) was revealed by my hosts. MIT rocks (especially my place of stay, TEP). Boston was pretty nice too – I quite enjoyed long walks there.

The conference itself had technical stuff extracted into ‘Hacking days’, so I ended up at quite tense technical track rather than hacking sessions. It wasn’t that bad, so I could create ‘socializing track’ at the main conference, and just hang around without any real agenda.

We had there quite less of startup spirit, that was so vivid last year. The current theme is “What we’re doing ” instead of “What we’re going to do”. For some it is maturity, for me it is boring :) I hope that was wrong impression.

I’ll sure try going there next year too. Just to see the usual suspects.

P.S. Wikipedia tremendous growth is back again.

I knew it – Metaverse runs on MySQL!

I was enjoying a fabulous read, Neal Stephenson’s novel Snow Crash. The book was written back in 1992, but it emphasized virtual reality and collaboration in a way that sounds modern still now.

What was fun (and made me write this spoiling post), is that it is easy to spot MySQL usage patterns in science fiction:

  • Metaverse – the virtual reality internet, has its counterparts already in our current non-fictional world. “Second Life” is a nice [MySQL powered] platform for virtual life and creativity, and it is now in state of early Metaverse evolution.
  • Central Intelligence Corporation (created after the merger of Library of Congress and CIA, as they were duplicating each other anyway) operates huge resources of user-provided content, like Flickr, YouTube or Wikipedia [MySQL powered again]. Though it had somewhat better interfaces and APIs, I guess that is what Web 5.0 is about ;-)

I really liked the idea how asynchronous replication and lack of locking effects virtual reality worlds – even these effects were taken into account by the author. Add neurolinguistics, civilization history, brilliant satire, and you really get the best geek fiction. Thumbs up for the book. The Diamond Age seems to be the next read.

On books, examples and wizards

I really like Flickr. I believe it is one of greatest services of the Web (my album is there :), and it runs MySQL. But I don’t understand why their database folks think nobody else knows how to build stuff. I have seen nice books and presentations from Yahoo! (oh wait!) and other guys, who have been building big systems and engineered their solutions properly, so they survived.

Technical literature was noticed:

Then there are whole books on the subject of capacity and scalability for the database layer.

Yes, they are good reads.

Or…

Then there are novels from developers that in many cases really don’t know the tricks of the DBMS they are working with, and create elaborate abstraction layers that automatically generate SQL for the DB in question from objects and such.

Some of these layers are made with efficient SQL in mind, or always allow to override their functionality with efficient SQL.

So, it is easy to answer this question:

But, with all these people who tell you how to do it, actually can they prove that it works under a constant high workload for many people all at the same time.

I believe these people can.

Now there’re parts of Flickr operation described:

You may be thinking to yourself yea right say you can do 20K + transactions per seconds that must be a crap load of expensive hardware all running, where all the data is served out of memory.

With proper data layout single 10000$ system may handle 10000 queries per second. Of course, hitting disk may decrease efficiency, so one may end up with 2-5$/query. I’m not sure Flickr would consider 100k$ database setup as expensive hardware. Here again, “all data served from memory” may sound expensive, but mostly systems serve just “most data from memory”. Like ours, which is running on those 10k$-class (disclosure: ~10 of them) machines and serving >30000 queries per second at the moment. And that is efficient.

This is blowing away minds and wiping stuff we know away:

All of our database connections are real time. Our load balancer for the database is written in 13 lines of PHP code.

There are lots of posts detailing how fast MySQL connections are and how database pooling isn’t necessary anymore. Our load balancer is actually 651 lines of PHP code, but it still connects to database at each request. And it takes less than millisecond to connect – quite affordable cost.

I am sure interested in all Flickr design specifics – it is nice application, perfect example and it seems to work. Though I don’t believe that we should deny any other knowledge, or we should be blindly following wizards and their examples. Every application differs, every community has different patterns and wishes, so we should rather follow what people need, and create good products for them, like Flickr. Sometimes even one-man (or all-volunteer) engineering team may do miracles, especially when there’re open platforms to build on.

It is hard to swallow the endless possibilities, that are provided by new type of services. I’m not sure wizardry these times is that difficult to swallow. In modern software world there’re no orthodox or unorthodox designs. There’re just ones which work and which don’t.