domas mituzas

I/O schedulers seriously revisited

The I/O scheduler problems have drawn my attention, and besides trusting empirical results, I tried to do more of benchmarking and analysis, why the heck strange things happen at Linux block layer. So, here is the story, which I found myself quite fascinating…
Continue reading “I/O schedulers seriously revisited”

Speaking at MySQL Conference again, twice

Yay, coming this year to the MySQL conference again. This time with two different talks (second got approved just few days ago) on two distinct quite generic topics:

Practical MySQL for web applications
Practical character sets

The abstracts were submitted weeks apart, so the ‘practical’ being in both is something completely accidental :) Still, I’ll try to cover problems met and solutions used in various environments and practices – both as support engineer in MySQL, as well as engineer working on wikipedia bits.

Coming to US and talking about character sets should be interesting experience. Though most English-speaking people can stick to ASCII and be happy, current attempts to produce multilingual applications lead to various unexpected performance, security and usability problems.

And of course, web applications end up introducing quite new model of managing data environments, by introducing new set of rules, and throwing away traditional OLTP approaches. It is easy to slap another label on these, call it OLRP – on-line response processing. It needs preparing data for reads more than for writes (though balance has to be maintained). It needs digesting data for immediate responses. It needs lightweight (and lightning) accesses to do the minimum work. Thats where MySQL fits nicely, if used properly.

TV movie guide for sunday evening

Lithuanians have spectacular movie choices on national TV channels this evening:

LTV – 21:00 – The Company
LNK – 21:00 – Bad Company
TV1 – 21:00 – In Good Company

Technology report, 2007

Some of Wikipedia technology state summed up in annual report.

Amazon, go to hell

Amazon opened MP3 music store. As iTunes is blocked here, it resulted in my ‘yay’ – I opened it and started browsing the catalogue. I immediately got few nice suggestions, and used the 1click shopping – it suggested installing download agent, so I did. Then it confirmed my billing details, and in the end, with me all happy and shining it spew out:

We are sorry…
We could not process your order because of geographical restrictions on the product which you were attempting to purchase. Please refer to the terms of use for this product to determine the geographical restrictions.

We apologize for any inconvenience this may have caused you.

Go to hell, Amazon.

IE finds JS in Images (old xss bug!)

Well, this fix was done more than three years ago, but this is one of most evil IE bugs in existence. Even better, it seems to have never been fixed, exists in IE7, and is being discussed in various places lately.

The problem is very simple – valid PNG files can be uploaded to various sites, and then shown to users. The problem is that IE does autodetection, and if it suspects that the file may be HTML, it executes it as HTML, with all Javascript inside. The images can be properly normal images, that show your kitten or wife or whatever. Still, IE will execute any exploit code that is included in them. Exploit code can actually load the actual image, so nobody will even realize they’re looking at image and not at an attack that hijacks their sessions, steals cookies and does all other sorts of evil things.

So, whenever anyone says IE is secure, just tell them to look at this problem.

My own database abstraction class

Back in 2006 July, I decided that all other database classes are not worth it, and created my own one, incorporating best features from MySQL and PHP world. It resulted in this brilliant code, which I showed to few colleagues, and got such quote:

i like your nonframework. it gives a fuzzy feeling to the poor souls that think they need an abstraction layer. — Kristian Köhntopp

This was written using TIC pattern and can be used in variety of applications:

class MyDB {
  var $conn = null;

  function MyDB($database=null,$user='root',
                $password='',$host='localhost') {
    $this->conn=mysql_connect($host,$user,$password) and
    $database?mysql_select_db($database, $this->conn):null;
  }

  function _escape($s) {
    return mysql_real_escape_string($s,$this->conn);
  }

  function _quote($s) {
    return "'" . $this->_escape($s) . "'";
  }

  function __call($method,$arguments) {
    $query=preg_replace_callback('([A-Z]|\d+)',
      create_function('$s',
         'return " ".strtolower($s[0]);'),
         $method);
    $query=str_replace(' everything ',' * ',$query);
    $first=array_shift($arguments);
    if ($first) {
      if (is_array($first)) {
        $query .= ' (' . implode(',',
                            array_map(array(&$this,'_escape'),
                            $first)) . ') ';
      } else {
        while($argument = array_shift($arguments)) {
          $first = preg_replace('/\?/',
                     $this->_quote($argument),$first,1);
        }
        $query .= $first;
      }
    }
    $ret=array();
    $res=mysql_query($query,$this->conn);
    if (!$res) { print mysql_error(); exit(); }
    while($row=mysql_fetch_assoc($res)) {
      $ret[]=$row;
    }
    return $ret;

  }
}

$x = new MyDB('test');
$x->selectEverythingFromMytableWhereIdIn(" (?,?,?) ","a'b",2,3);
$x->SelectBlahFromMytableWhereIdIn(array(1,2,3));
$x->InsertIntoBlahValues(array(1,2,3));
$x->truncateBlah();

Now I wonder where I should build community for this, Google Code or Sourceforge? Or should that be the darling MySQL Forge.

As I started digging my stuff from history, I also managed to upload my past six month pictures to flickr too – including Vienna, Taipei and few other trips.

Rant on search crawlers

This isn’t even remotely funny. Every major search crawler provides different Accept-Encoding headers that make it bypass cache and always hit the backend. It is easy to hack Squid to disregard spaces between options (as IE puts them in headers: gzip, deflate, and Mozilla does not: gzip,deflate), but some of these things make caching hell:

msnbot: Accept-Encoding: identity;q=1.0
googlebot: Accept-Encoding: gzip
yahoo (slurp): Accept-Encoding: gzip, x-gzip

Add Opera with it’s Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0 and KHTML with Accept-Encoding: x-gzip, x-deflate, gzip, deflate, and you get a hell where bold normalization solutions have to be applied. I guess we just have to treat it as single-bit ‘gzip’ and ‘plain’ difference, and screw everything else.

Update: squid patch :)

Smart software and dynamic links

“Know the stack” is really required mantra in web app world, but usually the most important component – clients (aka browsers) are quite often forgotten. We have learnt to deal with IE bugs (MS definition: behavior to help web-masters create better content :), adapt various CSS fixes for different browsers, provide hints for robots (robots.txt must die…), etc. But new breed of clients showed up – desktop indexing software, that decides to help with internet indexing too – and we didn’t know some of its behavior.

One search engine’s desktop searching software decided to traverse few meta-links to RSS feeds on our pages. It encountered & escape sequence in links (well, HTML standard way to write ampersand everywhere, even in links), and decided that this must be some error, did not unescape it, and followed the link. This resulted in non-RSS page with meta link to RSS embedded. All the usual web links “a=b&c=d” became “a=b&c=d” – what ended up as options ignored, non-RSS versions of pages given, meta links with newly generated RSS links (with all the & spam) embedded, and the desktop indexing software happily ended up in recursion & loop.

This has resulted in additional gigabit of traffic from users using the product. As it was generally end-of-year decline of load, we didn’t feel it too much, but still, it raised awareness of the issue:

Never have infinite links, as there will be some product which might follow them infinitely.

Put that product on many PCs of regular users (oh yay, thats status quo) and very nice unintentional DDoS happens.

Every link written on website has to be normalized, canonized, unknown options stripped, known options ordered and filtered.

Especially, if such link is written on every page on the site. Then even standards ignorant or buggy products will not end up doing crazy loops. And by the way, let’s welcome the desktop indexing software to the stack of buggy clients we have to care about.

A perfect christmas story

This made me laugh. Providers were fighting who will give better ‘free sms’ plans. Then my GSM provider decided that they have enough resources for even better campaign – they offered adding 0.02 LTL to account balance for every SMS received. So, smart kids saw business opportunity – they started spamming SMS messages from one phone (free SMS!) to another (get paid for SMS received). Smarter kids started automating the process using their computers (though didn’t see to many “how to use kannel” guides – most public solutions were using gui tools and automated mouse movers :).

The best part is that smartest kids immediately found ways to cash out the ‘GSM LTLs’ – by using ‘call to pay’ service providers, and getting 50% cash efficiency.

One GSM provider (Omnitel, TeliaSonera company) reacted by establishing daily SMS limit that works (only 6LTL worth of SMSes per day), whereas other provider (Tele2) established limit that doesn’t (phones would get disconnected next day only).

And of course, this has brought down GSM providers, or at least their SMS networks – at Christmas. Way to go, marketing people. Way to go.

For all the international people: 1 EUR= 3.4528LTL

Edit: 0.02LTL for SMS

	markcallaghan (@mark… on MySQL does not need SQL
	markcallaghan (@mark… on MySQL does not need SQL
	Domas Mituzas on MySQL does not need SQL
	Marc on MySQL does not need SQL
	Nils Meyer on linux memory management for…