I’ve seen quite some work done on implementing mmap() in various places, including MySQL.
mmap() is also used for malloc()’ing huge blocks of memory.
mmap() data cache is part of VM cache, not file cache (though those are inside kernels tightly coupled, priorities still remain different).
If a small program with low memory footprint maps a file, it will probably make file access faster (as it will be cached more aggressively in memory, and will provide pressure on other cached file data -thats cheating though).
If a large program with lots and lots of allocated memory maps a file, that will pressure the filesystem cache to flush pages, and then… will pressure existing VM pages of the very same large program to be swapped out. Thats certainly bad.
For now MySQL is using mmap() just for compressed MyISAM files. Vadim wrote a patch to do more of mmap()ing.
If there’s less data than RAM, mmap() may provide somewhat more efficient CPU cycles. If there’s more data than RAM, mmap() will kill the system.
Interesting though, few months ago there was a discussion on lkml where Linus wrote:
Because quite frankly, the mixture of doing mmap() and write() system calls is quite fragile – and I’m not saying that just because of this particular bug, but because there are all kinds of nasty cache aliasing issues with virtually indexed caches etc that just fundamentally mean that it’s often a mistake to mix mmap with read/write at the same time.
So, simply, don’t.
Update: Oh well, 5.1: –myisam_use_mmap option… Argh.
Update on update: after few minutes of internal testing all mmap()ed MyISAM tables went fubar.
The case where mmap() is really good is where we can reduce the number of copies of data made with it. The MyISAM key buffer is an example where I tried – not very successfully – to use it.
If we can get/put data directly from a mmap()’d location, instead of copying it from an OS buffer (as with pread / pwrite) then we save a copy operation.
If, of course, the code is written so that it assumes it has a private copy of the page to mess around with (I’m pretty sure MyISAM DOES do this), then you have to copy it anyway (e.g. with memcpy) which pretty much negates the advantage (I expect).
My performance data from mmap()ing MyISAM keys suggested that the benefits were modest anyway.
Mark
Just a clarification: With the default options, mmap() does not allocate the pages until you access them, so the memory pressure is really more related to the working set size, not the file size.
mmap() really wins when you can use the data directly, without processing it, and due to the secondary effects on the file cache (if you read() data and put that into anonymous pages, it may force pages out of the file cache, which does not happen with mmap).
In some cases it also eases development, since mmap()’ing data is almost zero-cost when the data is already in the cache, so you can save on initialization time.
Indeed, there’re multiple cases when mmap() can be helpful, if used correctly – but there’re many ways to make mistakes too.
Software has to be really helping the OS with madvise() – though the interface is quite limited (it is just extreme flushing of cache, not reducing priorities).
visiskai nerealus thread’as lkml’e. aciu ;))