charsets – domas mituzas

When InnoDB compares data (like, when looking up in indexes), it actually asks help from MySQL – character sets may provide different rules for evaluation. MySQL then looks up character set information. This is where the fun begins – if character set used for comparison is default server character set, or latin1 – there are shortcuts. If not (say, some smart developer decided to use Unicode without forcing DBA to set default server charset, as it isn’t needed) – a very nice internal routine is called, which at the very beginning does exactly this:

 /*
    To make things thread safe we are not
    allowing other threads to interfere
    while we may changing the cs_info_table
  */
  pthread_mutex_lock(&THR_LOCK_charset);

Apparently, ha_innodb.cc has such comment too:

/* Use the charset number to pick the right charset struct for
the comparison. Since the MySQL function get_charset may be
slow before Bar removes the mutex operation there, we first
look at 2 common charsets directly. */

if (charset_number == default_charset_info->number) {
    charset = default_charset_info;
} else if (charset_number == my_charset_latin1.number) {
    charset = &my_charset_latin1;
} else {
    charset = get_charset(charset_number, MYF(MY_WME));
    [...]

I’ll avoid going into discussions why such global lock at every row operation is harmful, but in case anyone is hitting lots of mutex contention there – just set default server character set to what your databases are in (or use binary or latin1, or add a clause for utf8 up here, or remove mutex, nobody is changing your character sets anyway ;-)

Update: Some of the discussion is at bug#42649

Yay, coming this year to the MySQL conference again. This time with two different talks (second got approved just few days ago) on two distinct quite generic topics:

Practical MySQL for web applications
Practical character sets

The abstracts were submitted weeks apart, so the ‘practical’ being in both is something completely accidental :) Still, I’ll try to cover problems met and solutions used in various environments and practices – both as support engineer in MySQL, as well as engineer working on wikipedia bits.

Coming to US and talking about character sets should be interesting experience. Though most English-speaking people can stick to ASCII and be happy, current attempts to produce multilingual applications lead to various unexpected performance, security and usability problems.

And of course, web applications end up introducing quite new model of managing data environments, by introducing new set of rules, and throwing away traditional OLTP approaches. It is easy to slap another label on these, call it OLRP – on-line response processing. It needs preparing data for reads more than for writes (though balance has to be maintained). It needs digesting data for immediate responses. It needs lightweight (and lightning) accesses to do the minimum work. Thats where MySQL fits nicely, if used properly.

	markcallaghan (@mark… on MySQL does not need SQL
	markcallaghan (@mark… on MySQL does not need SQL
	Domas Mituzas on MySQL does not need SQL
	Marc on MySQL does not need SQL
	Nils Meyer on linux memory management for…

Tag: charsets

Charsets mutex meets InnoDB

Speaking at MySQL Conference again, twice