I really like Flickr. I believe it is one of greatest services of the Web (my album is there :), and it runs MySQL. But I don’t understand why their database folks think nobody else knows how to build stuff. I have seen nice books and presentations from Yahoo! (oh wait!) and other guys, who have been building big systems and engineered their solutions properly, so they survived.
Technical literature was noticed:
Then there are whole books on the subject of capacity and scalability for the database layer.
Yes, they are good reads.
Or…
Then there are novels from developers that in many cases really don’t know the tricks of the DBMS they are working with, and create elaborate abstraction layers that automatically generate SQL for the DB in question from objects and such.
Some of these layers are made with efficient SQL in mind, or always allow to override their functionality with efficient SQL.
So, it is easy to answer this question:
But, with all these people who tell you how to do it, actually can they prove that it works under a constant high workload for many people all at the same time.
I believe these people can.
Now there’re parts of Flickr operation described:
You may be thinking to yourself yea right say you can do 20K + transactions per seconds that must be a crap load of expensive hardware all running, where all the data is served out of memory.
With proper data layout single 10000$ system may handle 10000 queries per second. Of course, hitting disk may decrease efficiency, so one may end up with 2-5$/query. I’m not sure Flickr would consider 100k$ database setup as expensive hardware. Here again, “all data served from memory” may sound expensive, but mostly systems serve just “most data from memory”. Like ours, which is running on those 10k$-class (disclosure: ~10 of them) machines and serving >30000 queries per second at the moment. And that is efficient.
This is blowing away minds and wiping stuff we know away:
All of our database connections are real time. Our load balancer for the database is written in 13 lines of PHP code.
There are lots of posts detailing how fast MySQL connections are and how database pooling isn’t necessary anymore. Our load balancer is actually 651 lines of PHP code, but it still connects to database at each request. And it takes less than millisecond to connect – quite affordable cost.
I am sure interested in all Flickr design specifics – it is nice application, perfect example and it seems to work. Though I don’t believe that we should deny any other knowledge, or we should be blindly following wizards and their examples. Every application differs, every community has different patterns and wishes, so we should rather follow what people need, and create good products for them, like Flickr. Sometimes even one-man (or all-volunteer) engineering team may do miracles, especially when there’re open platforms to build on.
It is hard to swallow the endless possibilities, that are provided by new type of services. I’m not sure wizardry these times is that difficult to swallow. In modern software world there’re no orthodox or unorthodox designs. There’re just ones which work and which don’t.
Interesting post. I mentioned it in Log Buffer #9.
Thanks,
Dave Edwards
Log Buffer