I may retract this post, I may have been way too incorrect here, not sure yet.
Hash maps were invented sixty years ago, apparently. MySQL reinvented them.
Original idea was too fancy and too good, I guess. It allowed very cheap access to data, if you knew a key, and it achieved that by having a hashing function, which is used to pick a slot, then going directly to that slot. It is used in your computer all the time. ALL THE TIME.
They are so fast and useful, that they are always treated as building blocks. There have been various iterations later, to support concurrency, hashing functions evolved, etc, but the premise was the same.
If we look at the dictionary, it is obvious that “hash” is:
a mixture of jumbled incongruous things; a mess.
Yes, MySQL, the whole concept is to have as messy as possible data distribution, so that there are no natural collisions. Because if there is a collision, then there is contention over the list sitting in a hash bucket, or a worker, ready to do the work, or some fancy shmancy high performance concurrent workflow.
But no, MySQL wants order, it does not want to have the hashing function messy, it will redefine the term and sixty years of industry practice. For example, given these inputs, it will calculate hash for them as close as possible (tested with
(gdb) print my_calc_hash(&mapping_db_to_worker, "db100002", 8)):
- db100002 – 0xace3c8bc
- db100003 – 0xace3c8f3
- db100004 – 0xace3c836
It will decide, that for each change in byte it will change only one byte in the hashing function
(pretty much every other hashing function like CRC32, Adler, Fletcher, HeikkiFolding, MD5, SHA1 will have all bytes different ) (this was debunked by a commenter, I was completely wrong here).
Then MySQL will decide that resulting hash is still messy and will not look at different bytes, just map everything to exactly same slot.
Mark saw this on other place, his performance of 5.6 was not too good, and he investigated hash problems in this bug report. MySQL developers looked at it and filed a separate bug which still does not catch the whole essence – MDL lock is blah blah blah.
No, MySQL, it is not related to any use case of your hashing implementation. This is a bug in your critical data structure code. You should be just happy that it is not used too much, only in two places, one that is on Mark’s radar, but other is in new flashy beautiful feature of MySQL 5.6 – Multi Threaded Slave. Or should I call it, Not-So-Multi-Threaded Slave. NSMTS.
P.S. On the upside, I can skip talking about MTS in my “high performance replication” talk at MySQL Conference next week \o/
Update: It may be that MTS is not working because of other issues, checksum function is getting empty inputs.