On checksums

InnoDB maintains two checksums per buffer pool block. Old formula of checksum, and new formula of checksum. Both are read, both are written. I guess this had to be some kind of transition period, but it obviously took too long (or was forgotten). Anyway, disabling checksums code entirely makes single-thread data load 7% faster – though in parallel activity locking contention provides with some extra CPU resources for checksum calculation.

Leaving just single version of checksum would cut this fat in half, without abandoning the feature entirely – probably worth trying.

Update: Benchmarked InnoDB checksum against Fletcher. Results were interesting (milliseconds for 10000 iterations):

Algorithm: InnoDB Fletcher
826 453
-O2: 316 133
-O3: 42 75

So, though using Fletcher doubles the performance, -O3 optimizes InnoDB checksumming much better. How many folks do run -O3 compiled mysqld?

6 thoughts on “On checksums”

  1. Good moment perhaps to mention the checksum algorithm also, people will be curious and one can always discuss… just paste code snipped w/ comments?

  2. Indeed it is not 2 times gain but there is some to save,
    Another question is if checksumming algorithm is as efficient as possible ?
    Could be it could be replaced with some other CPU Cache scratch free SSE based algorithm for better performance.
    I know Linux kernel uses these for RAID checksum computation when available.

  3. Interesting, SSE4.2 adds CRC32-on-chip. It also adds more nice functions, that can be used for strlen() and similar optimizations. Nehalem will be kickass for databases, if they manage to use the feature set.

  4. Is there an option to turn it off entirely if the filesystem is checksumming? I keep meaning to look into this.

    For instance ZFS always checksums and uses encryption like algorithms and CPU features to be extremely efficient about it. With that big non-Open Source DB, turning off the DB checksum was a good win. Neel has a blog on this somewhere…

  5. Matt, –skip-innodb-checksums

    Yeah, ZFS can be fast, but so can be userland checksums, if implemented properly. The way currently InnoDB does it may not be that easy to optimize at compiler level – I’ll have to check it more.

Comments are closed.