Google does encyclopedia: Knol

It is all closed yet, announcement by VP Engineering tells us Google is launching their idea of encyclopedia – looking for people who can write authoritative articles. No words on licensing apart from “we want to disseminate it as widely as possible”, though author-centric view is more what Citizendium, than Wikipedia wants to do.

Ad revenue sharing poses many interesting questions, especially in collaborative effort. As Wikipedia now provides page view statistics, Google (or knollers) may just work on top of the cream pages (by knowing search trends), and have very distorted overall content. Now it is closed, invite only, so we can’t tell anything more. Time will show. Good to know more organizations are believing about aggregating and disseminating knowledge – it is Wikipedia’s mission, and it is nice to have partners. :-) Though of course, there might be some tensions with Search Quality team…

On guts and I/O schedulers

Benchmarks and guts sometimes may contradict each other. Like, a benchmark tells that “performance difference is not big”, but guts do tell otherwise (something like “OH YEAH URGHH”). I was wondering why some servers are much faster than other, and apparently different kernels had different I/O schedulers. Setting ‘deadline’ (Ubuntu Server default) makes miracles over having ‘cfq’ (Fedora, and probably Ubuntu standard kernel default) on our traditional workload.

Now all we need is showing some numbers, to please gut-based thinking (though it is always pleased anyway):

Deadline:

avg-cpu:  %user   %nice    %sys %iowait   %idle
           4.72    0.00    7.95   18.18   69.15

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s
sda          0.00   0.10 91.30 31.30 3147.20 1796.00

    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
  1573.60   898.00    40.32     0.98    7.98   3.65  44.80

CFQ:

avg-cpu:  %user   %nice    %sys %iowait   %idle
           4.65    0.00    7.62   38.26   49.48

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s
sda          0.00   0.10 141.26 38.86 4563.44 2571.03

    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
  2281.72  1285.51    39.61     7.61   42.52   5.38  96.98

Though load slightly rises and drops, the await/svctime parameters are always better on deadline. The box does high-concurrency (multiple background InnoDB readers), high volume (>3000 SELECT/s) read-only (aka slave) workload on ~200gb dataset, on top of 6-disk RAID0 with write-behind cache. Whatever next benchmarks say, my guts will still fanatically believe that deadline rocks.

Optimization operator

I have introduced this to quite a few colleagues in a form of question “what is the optimization operator in C++/PHP/…?”
The answers varied a lot, people would come up with branch prediction stuff (likely(),etc), and many other ideas, though never right ones.
The answer was pretty straightforward. It works in quite a lot of programming languages:

//

Simply commenting out code optimizes things better than any other way. Go, try it.

Weird wit by Google translation technology

I was translating some document from German to English, that had my surname in it.
It got translated to ‘Beesley’, and I immediately thought of Angela Beesley, chair of Wikimedia Advisory Board. I started playing more, and did find, that:

  • French ‘Domas Mituzas’ to English translates as ‘Anthere fall’
  • ‘Mituzas’ in German is ‘Schindler’ (Matthias?:)
  • Spanish ‘Domas Mituzas’ to English translates as ‘Anthere Anthere’ (every wikipedian has a bit of Florence inside :)
  • English to Portuguese renders me as “Domas Lessig” (I have creative commons t-shirt :)
  • English to Chinese is “florence 100,000″…

Thats what Web 3.0 is all about. Tampering with my personality. Who am I? :)

Dynamo by Amazon

Amazon has published a nice paper on Dynamo, their distributed hash storage. Such kind of solution is really needed for the big shops, and the only issue why we’re not using anything like at Wikipedia that is that we don’t actually have it.

I saw a comment in there by “Max”:

Looks like a lot of thought has been put into this. I stopped reading about halfway through but how much of this was inspired by memcached?

I guess the answer would be ‘none’. Key-value databases will be mainstream for 30 years already soonish (the first ‘dbm’ was released by Ken Thompson at AT&T back in 1979). It wasn’t distributed back then, but you may want to slap ‘distributed’ label to pretty anything nowadays. The method of doing that is usually the problem, and in this case Dynamo choses something similar to Chord-like rings (though with simplified routing).

Still, there’re no open libraries that would provide with easy reliable DHT building, and I’d love every paper like this to come with open-source code rather than patents attached.

P.S. My first serious web-app coding was on Sybase PowerDynamo ;-)

Another day of travel

As every travel morning it starts with getting up (04:30AM), packing, brushing, washing, yawning, bathing, all at the same time. I get to airport, and I realize, I’m going to Dublin, one of destinations for lithuanian economic migrants – that means big queue of good life seekers. Though usually check-in queues are just few-minutes long, this time it takes me half an hour to get to the desk, where the fun of the trip starts.
Continue reading “Another day of travel”

Live Earth!

In Vienna’s Ratthaus-platz there’s a screen which shows Live Earth – great place to think of your own environmental impact. I’m pretty efficient in that regard – working at home eliminates daily transportation issues, my electricity consumption is mainly backed by clean nuclear energy, and.. oh, I had 20 plane flights in 2007 already. Probably thats far less, compared to what I’d have to do if I visited each and every customer. Continue reading “Live Earth!”

MySQL 4.0 Google Edition

At the Conference I realized how much Wikipedia’s database operation had in common with Google – many rules and ideas of operation, problems faced, solutions imagined. Ah, there is one huge difference – they have brilliant engineers resolving quite a few of the issues, whereas we slack and live with as-is software. One of very nice coincidences is that our base MySQL version used is 4.0.26 – surprise, surprise, same one as the one Google have released patches for :) So, here the story of running MySQL fork begins… Continue reading “MySQL 4.0 Google Edition”