My own database abstraction class

Back in 2006 July, I decided that all other database classes are not worth it, and created my own one, incorporating best features from MySQL and PHP world. It resulted in this brilliant code, which I showed to few colleagues, and got such quote:

i like your nonframework. it gives a fuzzy feeling to the poor souls that think they need an abstraction layer. — Kristian Köhntopp

This was written using TIC pattern and can be used in variety of applications:

class MyDB {
  var $conn = null;

  function MyDB($database=null,$user='root',
                $password='',$host='localhost') {
    $this->conn=mysql_connect($host,$user,$password) and
    $database?mysql_select_db($database, $this->conn):null;
  }

  function _escape($s) {
    return mysql_real_escape_string($s,$this->conn);
  }

  function _quote($s) {
    return "'" . $this->_escape($s) . "'";
  }

  function __call($method,$arguments) {
    $query=preg_replace_callback('([A-Z]|\d+)',
      create_function('$s',
         'return " ".strtolower($s[0]);'),
         $method);
    $query=str_replace(' everything ',' * ',$query);
    $first=array_shift($arguments);
    if ($first) {
      if (is_array($first)) {
        $query .= ' (' . implode(',',
                            array_map(array(&$this,'_escape'),
                            $first)) . ') ';
      } else {
        while($argument = array_shift($arguments)) {
          $first = preg_replace('/\?/',
                     $this->_quote($argument),$first,1);
        }
        $query .= $first;
      }
    }
    $ret=array();
    $res=mysql_query($query,$this->conn);
    if (!$res) { print mysql_error(); exit(); }
    while($row=mysql_fetch_assoc($res)) {
      $ret[]=$row;
    }
    return $ret;

  }
}

$x = new MyDB('test');
$x->selectEverythingFromMytableWhereIdIn(" (?,?,?) ","a'b",2,3);
$x->SelectBlahFromMytableWhereIdIn(array(1,2,3));
$x->InsertIntoBlahValues(array(1,2,3));
$x->truncateBlah();

Now I wonder where I should build community for this, Google Code or Sourceforge? Or should that be the darling MySQL Forge.

As I started digging my stuff from history, I also managed to upload my past six month pictures to flickr too – including Vienna, Taipei and few other trips.

profiling web applications

I’ve experimented a bit with wikipedia’s existing profiling framework lately, and extracted some of bits from that. The way we do it now is having our internal profiler written in PHP, and wfProfileIn() / wfProfileOut() function calls around the code. As there’re quite a lot of profiling entry places, overhead isn’t tolerable at high loads, so it’s turned on only for every 50th page view.

One of interesting entry points is in Database::query(), we have a tiny set of regexps in Database::generalizeSQL, that convert literal arguments into ‘X’ for string and ‘N’ for integer. This way we end up with all similar queries grouped together in profiling report, showing us which queries need some love or bring down our cluster.

Profiler may have different personalities, as result can be written to database table, printed out to a user, or sent over a network packed into UDP packets. I made a special daemon for handling those messages, which is just in-memory hashtable, that is updated on every incoming UDP message and any incoming TCP connection gets full XML export. Later this can be served in human output by reporting script. It is very easy to scale such profiling system, as state preservation is not needed, and XMLs can be aggregated. I guess it was possible to do that with HEAP tables in MySQL, but writing tiny daemons is sooo fun :-)

One of plans now is to rewrite profiler class into C (that would provide ability to run large-scale all-request profiling again), and merge that with our setproctitle extension, which currently allows to see what part of PHP code is being executed:

httpd: wfMsgGetKey [enwiki]
httpd: main-cleanup [dewiki]
httpd: requestfinish
httpd: query: SELECT page_id FROM `page` WHERE page_namespace = 'X' LIMIT N  [enwiki]
httpd: query: SELECT blob_text FROM `blobs` WHERE blob_id = 'X' LIMIT N  [enwiki]

Anyway, do not think of in-code profiling as of a replacement for in-core profilers such as APD or xdebug, as these can tell much more accurately where you’re fast and where you’re slow (sometimes these results may surprise!). But what you can win with code-controlled profiler – a general view for deployed application in distributed environments, that relies not on call-tree, but rather your own defined profiling blocks.

And now we know that :

  • 20-25% of our MediaWiki execution real time is spent waiting for MySQL (3.5ms per query)
  • 16% for memcached (or actually, Tugela, my memcached+BerkeleyDB hack :)
  • 20% of requests to backend are searches, and we spend 70ms average waiting for Lucene and Mono based mwdaemon to respond
  • Saving of an article takes around 0.8s, page view is around 0.1s
  • Average times (and deviations, if someone hacks reporting :) for every query…

In most cases developers without tools do miss real performance hotspots. This is what tools are for! :-)

php4: not supported, use php5

Tim wrote to PHP internals list, asking:

is there any intention to backport this simple but important bugfix to PHP 4?
Many PHP users are still using PHP 4, and it's not a very well advertised fact that
it does not properly support arrays with more than 64K entries.

Markus Berger responds:

Just change to 5.

It seems that MediaWiki HEAD branch will drop php4 support soon.