lost bits in php bytecode

Wikipedia users did get nice errors like:

Fatal error: Call to undefined function: peplaceinternallinks() in
 /usr/local/apache/common-local/php-1.5/includes/Parser.php on line 812

or

Table 'nlwiki.recemtchanges' doesn't exist (10.0.0.101)
 UPDATE `recemtchanges` SET ...

In both cases code was absolutely normal and after rogue Apache servers (different ones!) were restarted, those glitches did disappear. No ECC errors reported… what should be blamed? OS? Hardware? Apache? PHP? APC? Ummm….

php4: not supported, use php5

Tim wrote to PHP internals list, asking:

is there any intention to backport this simple but important bugfix to PHP 4?
Many PHP users are still using PHP 4, and it's not a very well advertised fact that
it does not properly support arrays with more than 64K entries.

Markus Berger responds:

Just change to 5.

It seems that MediaWiki HEAD branch will drop php4 support soon.

ways to (not) attract tv viewers

This year December 31st is Saturday. It means that I’m lazy in the morning, procrastinating all tasks and doing whatever is absolutely useless. So I compared two major TV channels in Lithuania, what are they offering for their beloved viewer on New Years Eve. I just took all movies (and some full-length animated ones) they’re showing, did check their IMDB ratings and used some formulas everyone knows to determine, if anyone should watch TV.

Metric LNK TV3
Count: 8.0 9.0
Rating sum: 36.6 47.0
Rating average: 4.6 5.2
Maximum: 6.6 7.4
Minimum: 3.6 3.3
SqDev: 6.1 10.7
Median: 4.3 5.4

First of all, you may end up in terror, as both TVs have sub-4 rating movies. Generally speaking, this is very very bad. On the other hand, average is terribly low as well, though TV3 might have slightly better status. If you get up early in the morning, you may watch ‘Chicken Run’, which is quite good animated flick. You’re saved if you have cable or satellite TV, or if you’ve got enough booze to erase all memories :) I just wonder, how could major TV channels get so much crap…

And this is the proof:

LNK                     IMDB  | TV3                       IMDB
09:45 Groove squad       3.9  | 08:00 Der Weihnachtswolf   5.5
11:10 Good burger        4.2  | 09:45 Chicken Run          7.4
12:50 Mr. Nice Guy       4.8  | 11:10 The Cat In The Hat   3.3
14:25 Joe's Apartment    5.0  | 12:45 Dunston Checks In    4.6
15:55 On Deadly Ground   3.6  | 14:25 Shallow Hal          6.0
19:00 Der Clown          4.1  | 16:35 Nutty Professor II   4.5
20:50 Rush Hour          6.6  | 19:08 Gorgeus              5.4
01:30 Who's your daddy   4.4  | 00:05 Commando             5.7
                              | 01:55 Swimfan              4.6

2005: year one second longer

Instead of celebrating 2006, I’ll better choose to celebrate leap second – last time it happened in 1998. That is the second of total freedom, it is not accounted usually in budgets, calendars or anywhere else. It is announced 6 months before it happens.

On the other hand, you can make those leap seconds happen every year. All you have to do is sit on your office chairs and rotate counter-clockwise. Every rotation will shorten the day by 10-35s. This has to happen in northern hemisphere of course, southerners should rotate clockwise.

And here is the important part, when will those free seconds happen:

  • Helsinki, Vilnius, Cairo: 2005-12-31 21:59:60
  • Paris, Berlin: 2005-12-31 22:59:60
  • UTC: 2005-12-31 23:59:60

So, synchronize your clocks, let’s have some free time!

MySQL 5.0 optimizer: loose scans

MySQL 5.0 among lots of visible features, introduced several neat optimizer improvements, that may give surprising performance in some queries. Loose index scan, or rather index jumping, allows fast aggregate max() or min() operations, as well as distinct row queries.

Let’s have such table:

+------+------+
| a    | b    |
+------+------+
|    1 |    5 |
|    1 |    3 |
|    2 |    4 |
|    2 |    2 |
|    1 |    1 |
+------+------+

A SELECT MIN(b),MAX(b) FROM ournicelittletable GROUP BY a might be executed in different ways. Usually aggregate functions need to build a temporary table in order to sort results, or use index for sorting.

In case of temporary table, aggregation is done by reading all rows from dataset and updating in-memory grouped results. With our tiny sample dataset there would be table created with primary key ‘a’, as well as min() and max() columns. Then a table scan would happen, which would update aggregated values on every in-memory row with bigger for max() or smaller for min() b value found. After full scan is done, this table could be returned directly to user.

If there is an index on fields (a,b), then tight index scan may happen – it would read values in such order: (1,1),(1,3),(1,5),(2,2),(2,4). In this kind of execution engine does not need to build any kind of temporary tables, once it reads all rows with same ‘a’ value, it can return a minimums or maximums to the user, then continue working on other ‘a’ values.

If MySQL 5.0 would find an index on (a,b), it would simply go to each a values, read one row for min(), one row for max(), as that information is already in index tree, and immediately return that to user.

I built a bigger table, with >200k rows, with unique field a, and added 233 groups, specified in b, with 1000 values in c each. One group didn’t have 1000 values, so I tried to find which one:

mysql> select b,max(c),min(c) from tttable group by b  having max(c)<1000;
+------+--------+--------+
| b    | max(c) | min(c) |
+------+--------+--------+
|  233 |     83 |      1 |
+------+--------+--------+
1 row in set (0.01 sec) (no query cache :)

Then I tried same on different engine, which didn’t support loose scans, and query took 0.2s. I repeated executions on both engines, so that data would be cached by OS, but speed difference was quite noticable. Of course, we could check what was done in there. The 0.01s query was executed this way:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tttable
         type: range
possible_keys: NULL
          key: b_2
      key_len: 5
          ref: NULL
         rows: 471
        Extra: Using index for group-by

Please note the ‘Extra’ line, and how many rows were read. As intended, two rows from each group, one for min() and one for max().
Now the 0.2s query did tell me that it was reading 230k rows. It did actually decide that in-memory table would be faster than reading index and did scan whole table. Another operation I tried was SELECT DISTINCT b FROM tttable. MySQL 5.0 executed the query in 0.01s, though the other engine was again more than 15 times slower, just because it had to deal with more rows.

Conclusion: loose index scans help. Take a look at the manual.

fighting hpdeskjetphobia: printers win!

Ten years ago a printer was bought in our household. It was HP DeskJet 520, black and white mod, and did cost a fortune – over $1000. It was a good one, 300dpi, and was used for 8 years. I used to clean it’s parts with alcohol, used vacuum cleaner several times as well. It may be still there, printing and making lots of noise.

So, it’s cost scared me so much that I got hpdeskjetphobia, I always avoided buying printers. I did see various printing facilities, e.g. a printing room in a bank, where transaction logs were printed on paper by large Kyoceras. Those noisy bastards were all around but not near me. In my previous job printers were outsourced from a company, and nearest one was five or six rooms away. Those were big and looked expensive. All printers looked to me expensive.

Now I decided I need one for my home office. I just went to a nearest shop, and the first one (HP PSC 1510) looked like the one I need. It even has scanner and copying functions! All for 100Eur. I took it home, opened the box, surprise surprise, it had CD for my Mac, with OCR software and some image management tool. All for 100 Eur. I don’t know how much I’ll have to pay for ink yet, but there was some included. For same 100 Eur.

I printed several photos on A4 photopaper and those looked amazing. Sure, now I need a digital camera. There is a story, why I have none yet. See, 8 years ago I was kind of early adopter. I did lots of photo footage for my school’s homepage with a camera, that did cost $1000 and had 640×480 resolution (borrowed, not own one). But I guess I have yet another phobia to kill and find out that now digital cameras are commodity as well. Maybe some pictures will end up in Wikimedia Commons? :)

syslog in mysql function

I experimented with sending UDP messages from MySQL functions, but ended up with simple syslogging UDF

I played a bit with various interaction with outer world ideas in MySQL UDFs and ended up with something what was really really simple – a single libc call. I did shamelessly steal bare bones and simply added a single line of code in it:

#include <mysql.h>
#include <string.h>
#include <syslog.h>

my_bool logger_init(UDF_INIT *initid, UDF_ARGS *args, char *message) {
        initid->maybe_null=0;
        return 0;
}

long long logger(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) {
        if (args->arg_count != 1) {
                strcpy(error, "LOGGER(): needs message");
                return 1;
        }

        if (args->arg_type[0] != STRING_RESULT) {
                strcpy(error, "LOGGER() message should be string");
                return 1;
        }

        syslog(LOG_INFO,"%s",args->args[0]);
        *is_null = 0;
        *error = 0;

        return 0;
}

Of course, I had to compile it:

gcc -I /usr/include/mysql/ -shared -o syslogudf.so syslogudf.c

And later load it:

mysql> create function logger returns integer soname 'syslogudf.so';
Query OK, 0 rows affected (0.00 sec)

One more thing… testing:

mysql> select logger(concat(user()," wishes you ",
    -> if(rand()>0.3,"good","bad")," luck"));
+---------------------------------------------------------------------------+
| logger(concat(user()," wishes you ",if(rand()>0.3,"good","bad")," luck")) |
+---------------------------------------------------------------------------+
|                                                                         0 |
+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

So now our systems administrator will see:

$ tail -1 /var/log/messages
Dec 12 01:09:22 flake mysqld-max: root@localhost wishes you bad luck

Oops!