again, on benchmarks

Dear interweb, if you have no idea what you’re writing about, keep it to yourself, don’t litter into the tubes. Some people may not notice they’re eating absolute crap and get diarrhea.

This particular benchmark has two favorite parts, that go with each other together really well:

I didnt change absolutely any parameters for the servers, eg didn’t change the innodb_buffer_pool_size or key_buffer_size.

And..

If you need speed just to fetch a data for a given combination or key, Redis is a solution that you need to look at. MySQL can no way compare to Redis and Memcache. …

Seriously, how does one repay for all the damage of such idiotic benchmarks?

P.S. I’ve ranted at benchmarks before, and will continue doing so.

update

In past few months I had lots of changes going on – left the Sun/MySQL job, my term on Wikimedia Board of Trustees ended, I joined Facebook and now I got appointed to Wikimedia Advisory Board. This also probably means that I will have slightly less hands-on work on Wikipedia technology (I’ll be mostly in “relaxed maintenance mode“), though I don’t know yet how much less – time will show :)

P.S. I also quit World of Warcraft. ;-)

A formal objection to formal objection!

Dear EC, I have an opinion on Sun/Oracle merger.

I don’t like objections, therefore I object to them. Of course, I understand, that some people want to play major parts in dramas and soap operas, others are spectating and enjoying it, but most people I know and respect simply want to be efficient and get their work done.

These people, many of whom were my colleagues back at Sun/MySQL, sometimes can be labeled as ones who don’t care about greater picture, or the common good. Unfortunately, the topic of common interest depends from religion to religion, from society to society, from person to person.

My approach to common good is doing what I want to do for the community, and not what community wants me to do. If that relation is not compatible for any of the sides, there is no relation at all. Now, what happens now, is that there is some user community (and I include all the people who are protecting their free-software-usage interests) which has demands for vendors.

When PG zealots keep telling that “PG is open, MySQL is closed-development by a company”, you will often see MySQL camp not listening to that at all. But there is some true in that, every MySQL user was using a product that was developed inside a citadel (even some builds were called ‘Citadel’), and company had a say over lots of issues.

Start with Windows licensing at the beginning, GPLv2 versus GPLv3 or any other issue, that was affecting our user community at large, it has always been top-down decision, nobody has ever asked me as a user what kind of license I would want – and all those decisions were made by MySQL, the semi-independent company.

I never actually craved for too much control of the project – I’m always free to fork (and I have forked at least 3 releases of MySQL, and now have to deal with yet another fork at work ;-). I have went through various stages of relations with MySQL – and at every of them I had to understand that open-source software carries its own set of risks, but also has its own set of mitigations. I’m free to support my organization needs at various outcomes, I don’t have any data lock-in, I don’t have any vendor lock-in, the only impact is a risk that I should have calculated years ago, not today.

So even if the argument starts with “MySQL is being acquired by competitor, and it is risky situation”, there are few real messages out there: “we need to protect people who don’t care about risks” – sounds very much like bailout money for all the people with crazy spending, as well as “if your entry campaign is about doing lots of common good, you may not be allowed to profit in the future”.

This doesn’t make open-source a good investment (would Sun pay 1B$ if it forecasted today’s EC position?), and it not being a good investment means there won’t be opportunities for people to balance common good and business opportunities in future.

If a community was supposed to have a voice in this situation, it should’ve had a fair share of board members in project management. If a community didn’t get its fair share of management at the organization, it should’ve found another organization, or accepted the risks.

In my opinion, if you are right now in the camp of supporting objections, it is not because you’re seeing a lot now, it is mostly because you didn’t see anything before.

GDB 7!

I wasn’t prepared for this. After spending months playing with GDB development trees I somehow entirely missed that 7.0 release is getting close, and took me more than an hour to spot it.

My favorite features are python scripting and non-stop debugging. I was toying around with python scripting for a while, and was planning to make backtraces make sense. Having hands that open means that one can see PHP backtraces, when gdb’ing apache, see table names and states when MySQL thread access handler interfaces, or remote IPs and users, when it is writing to network. Process inspection can simply rock, if right tools are created using these new capabilities, and I’m way too excited when I think about those. “Always have debugging symbols” gets way more meaning now.

Another issue I’ve been trying to resolve lately is avoiding long locking periods for running processes (directly attaching to process can freeze its work for a second or so, which isn’t that tolerable in production environments). GDB is getting closer to the async debugging capabilities – where one can run a debugger without actually stopping anything.

So, congratulations GDB team, now it is job for us to find all the uses of the tool. It has been invaluable so far, but this is much much more.

MySQL processlist phrase book

For your convenience, a short phrase book, starting with explanation of process states where MySQL is mostly working to look up data from tables:

  • “Sending data” – reading data from tables (or looking it up)
  • “Copying to tmp table” – reading data from tables (or looking it up)
  • “Copying to tmp table on disk” – query needs a rewrite
  • “statistics” – looking up data from tables
  • “Sorting result” – reading data from tables (or looking it up)

Locking is fun:

  • “update” – There is big chance that it is waiting for a lock, can be updating as well, on busy systems
  • “Locked” – Even bigger chance that it is MyISAM table waiting to be converted to InnoDB

Table opening is even funnier:

  • “Opening tables” – some other idiot thread is holding the LOCK_open or similar mutex
  • “Closing tables” – some other idiot thread is holding the LOCK_open or similar mutex
  • “Opening table” – the idiot thread (or not that smart DBA who thinks that table_cache should be set to current value)
  • “Flushing tables” – someone decided that flushing tables is good idea, but forgot to kill long running queries
  • “Waiting for tables” – same as “Flushing tables”

If you have replication:

  • “Connecting to server” – could not connect to server, waiting to retry
  • “Reconnecting after a failed master event read” – master and slave have same @server_id
  • “Registering slave on master” – master and slave have same @server_id
  • “Waiting to reconnect after a failed master event read” – master and slave have same @server_id
  • “Sending binlog event to slave” – actually, not sending binlog event – it is waiting for binlog mutex.

Few more tricky ones:

  • “Sleep” – last millisecond it was probably not sleeping, due to how process scheduling and kernel level buffering works probably it has finished right before kernel switched to the ‘SHOW PROCESSLIST’ thread.
  • “Writing to net” – usually, not writing to net (client has blocked) – actual writing to net is usually nearly non-blocking operation, as long as client doesn’t sleep.
  • “Killed” – actually, someone tried to kill it, but it doesn’t care, as it doesn’t check for thd->killed flag

And in the user column:

  • “unauthenticated user” – are you sure your DNS is working right?
  • “system user” – it isn’t user, these are (hopefully plural) replication threads, that don’t have any “user” credentials
  • “root” – either that is you running the PROCESSLIST, another session you forgot you are running or you forgot to create separate user for your application

I probably didn’t write quite a few important ones, but some of them are self-evident (such as “init” and “end”), and others probably will never show up :)

Getting apache core dumps in Linux

If you want to get core dumps for intermittent Apache/mod_php crashes in Linux, you will probably need this module (otherwise Linux kernel will refuse to dump core, whatever you put into your OS or Apache configuration):


/*
* Author: Domas Mituzas
* Released to public domain
*
* apxs -c -i mod_dumpcore.c
* and...
* LoadModule dumpcore_module .../path/to/mod_dumpcore.c
* CoreDumpDirectory /tmp/cores/
* and...
* sysctl -w kernel.core_pattern=/tmp/cores/core.%p.%t
*/

#include "httpd.h"
#include "http_config.h"
#include <sys/prctl.h>

static int dumpcore_handler(request_rec *r)
{ prctl(PR_SET_DUMPABLE,1,0,0,0); return DECLINED; }

static void dumpcore_register_hooks(apr_pool_t *p)
{ap_hook_handler(dumpcore_handler, NULL, NULL, APR_HOOK_MIDDLE);}

module AP_MODULE_DECLARE_DATA dumpcore_module = {
STANDARD20_MODULE_STUFF, NULL, NULL, NULL, NULL, NULL,
dumpcore_register_hooks };

P.S. I was quite astonished to find out that nobody ever needed this, I remember quite a few discussions after which we fixed this in MySQL two years ago.

Update: there’s also ‘echo 1 > /proc/sys/fs/suid_dumpable’ or ‘sysctl -w fs.suid_dumpable=1’ – now I recall whole story, RHEL3 and RHEL4 didn’t have this, so we had to do prctl() hack, whereas later Linux kernel versions allowed this workaround.

Dear IT Security Industry…

… You are full of shit.

I don’t know how effective your scare-mongering cash-extortion tactics are, but they don’t really help neither your users, nor vendors, nor anyone else.

It all starts when major vulnerability databases start authoritatively spouting out crap like this:

A vulnerability has been reported in MySQL, which can be exploited to compromise a vulnerable system.
The vulnerability is caused due to an unspecified error and can be exploited to cause a buffer overflow. (Secunia)

Or crap like this:

MySQL is prone to a buffer-overflow vulnerability because if fails to perform adequate boundary checks on user-supplied data.
An attacker can leverage this issue to execute arbitrary code within the context of the vulnerable application. Failed exploit attempts will result in a denial-of-service condition. (Securityfocus)

Continue reading “Dear IT Security Industry…”

Spikes are not fun anymore

English Wikipedia just scored “three million articles”, so I thought I’d give some more numbers and perspectives :) Four years ago we observed impressive +50% traffic spike on Wikipedia – people came in to read about the new pope. Back then it was probably twenty additional page views a second, and we were quite happy to sustain that additional load :)

Nowadays big media events can cause some troubles, but generally they don’t bring huge traffic spikes anymore. Say, Michael Jackson’s English Wikipedia article had peak hour of one million page views (2009-06-25 23:00-24:00) – and that was merely 10% increase on one of our projects (English Wikipedia got 10.4m pageviews that hour). Our problems back then were caused by complexity of page content – and costs got inflated because of lack of rendering farm concurrency control.

Other interesting sources of attention are custom Google logos leading to search results leading to Wikipedia (of course!). Last ones, for Perseids or Hans Christian Ørsted sent over 1.5m daily visitors each – but thats mere 20 article views a second or so.

What makes those spikes boring nowadays is simply the length of long-tail. Our projects serve over five million different articles over the course of an hour (and 20m article views) – around 3.5m articles are opened just once. If our job would be serving just hot news, our cluster setup and software infrastructure would be very very very different – and now we have to accommodate millions of articles, that aren’t just stored in archives, but also are constantly read, even if once an hour (and daily hot set is much larger too).

All this viewership data is available in raw form, as well as nice visualizations at trendingtopics, wikirank and stats.grok.se. It is amazing to hear about all the research that is built on this kind of data, and I guess it needs some improved interfaces and APIs already for all the future uses ;-)

MySQL DBA, python edition

In the age of jetsetting and space travel and ORMs and such, MySQL DBAs are the least sophisticated ones nowadays, usually fighting terabytes or petabytes of data with army of shell scripts – as there’re no nice frameworks to explain what you want to do in MySQL administration. The nice thing about proper object frameworks is that they allow to concentrate on the work and logic done, allowing to think on the process done, rather on languages/APIs/etc.

For example, moving a slave to another master down a replication topology could be expressed this way (this is a working code, actually):

slave = mysql(options.slave)
oldmaster = mysql(slave.get_master())
newmaster = mysql(options.newmaster)

oldmaster.lock()
oldpos = oldmaster.pos()
newmaster.wait(oldpos)
newmaster.lock()
oldmaster.unlock()
slave.wait(oldpos)
slave.change_master(newmaster)
newmaster.unlock()

I’m sure transaction group/global IDs would simplify the process a lot, but still, having building blocks one can write pretty much self-documenting narrow code, shuffle actions done without having to rethink whole programming logic too much. Implementation of methods like .sync(), .clone(), .promote() ends up environment-specific, but may save quite some time afterwards too.

As much as I’d like everyone around to get their data management actions written down into scripts, I’d like every DBA action I do to be written down in such code too :-) I’d love to have code, which detects resource shortages, orders servers, deploys software and re-shards data automatically… well, you know what I mean :)