Memcached quite often ends up as a store for very small objects (small key and some integer value), though it isn’t really designed to do this kind of work by default. Current memory management is based on slabs (200 of them), where objects are grouped by similar size – though actual sizes are pre-defined at startup based on few configuration parameters.
By default memcached would have slabs based on assumption, that smallest object size will have 48 bytes of data (thats without item header), and will increase the slab sizes in +25% steps:
slab class 1: chunk size 104 perslab 10082 slab class 2: chunk size 136 perslab 7710 slab class 3: chunk size 176 perslab 5957 slab class 4: chunk size 224 perslab 4681 ...
So, in this case, it allocates at least 104 bytes per object, and next steps are way behind. Fortunately, there’re some quick steps to have better efficiency:
Configuration!
There’re two parameters for this:
- -n – minimum space allocated for key+value+flags, defaults at 48
- -f – chunk growth factor, default 1.25
With these settings memcached will define just 38 slabs, ranging from 104 to 458992 byte sized chunks. Even in case of mixed workloads one can use -n 5 -f 1.05
– this will define ~170 slabs, where low values will increase in 8-byte chunks, and smallest slabs would look like this:
slab class 1: chunk size 64 perslab 16384 slab class 2: chunk size 72 perslab 14563 slab class 3: chunk size 80 perslab 13107 ...
It would get way higher memory efficiency for larger objects too (6k steps at 100k object sizes, rather than 30k steps with default configuration). Of course, more slabs means that there’re more eviction queues, and in case distribution of object sizes changes, it would have more memory fragmentation, though thats nothing a restart can’t resolve ;-)
Internal storage
Every data item has a header, which includes pointers to other item structures, and additional metadata. One of obvious things that will be fixed in later releases is the CAS (compare-and-swap) metadata (8 bytes per object), which is stored for every object – though very rarely used by users (one needs to use special breed commands). In future versions this might get resolved, and a very dirty hack would be changing cas_id to be uint8_t in memcached.h (heeeee!).
There’re also multiple pointers inside the header – and on 64 bit systems they take 64 bits – though in theory chunks inside memory pages (slabs can have multiple memory pages assigned) can be addressed with 16-bit pointers. Of course, the easy workaround here is simply compiling memcached as 32-bit binary – though then it won’t be able to address more than ~3GB per-instance (and running multiple memcached instances is straightforward).
There’s another internal CPU-vs-memory optimization, where objects internally end up aligned at 8-byte boundaries. Hacking CHUNK_ALIGN_BYTES at slabs.c (I set it to 2) allows us to have chunk sizes increased in much smaller steps.
Data!
Have small keys. Have small data. Compression at application will reduce network i/o, less roundtrips and system calls, and better memory efficiency in the end. Pack integers inside keys into base250 or so (skip whitespace), store binary data.
It is usually way less cycles to have more efficient storage, compared to additional cycles when a cache miss happens :)
Testing & summary
I tried to simulate the most edge case out of all edge cases – integer key and one byte data objects being inserted into 64M memcached instance.
- I could fit in 645k objects inside regular memcached, 932k after factor changes.
- Simple 32-bit build fit in 763k objects, 1164k after factor changes, 1180k after alignment change
- After removing CAS support, it fit 1378k, 1461k after reducing key with base250
So, I could have such slab size distribution (it facilitates objects up to 45k in size):
slab class 1: chunk size 33 perslab 31775 slab class 2: chunk size 34 perslab 30840 slab class 3: chunk size 35 perslab 29959 slab class 4: chunk size 36 perslab 29127 ...
MySQL!
64M-sized MEMORY table will be able to store 2087k (INT,TINYINT) entries. When people aim for no-eviction storage, MySQL can be way more efficient.
Interesting though, the PK will take as much space as data itself (what simply asks PK to be held together with data, like InnoDB does (it will actually fit 2500k entries in 64M). With custom MySQL engines this shouldn’t be too difficult for those who really hit the edge cases, right? :)
Oh well, last I’ve heard, memcached is going to have storage engine support too, I wonder how fun will be hacking those (someone ages ago plugged BDB into memcached and called it Tugela.. ;)
Interesting blog article Domas! I am addressing the space efficiency as part of the work I’m doing on the storage engine work.
Btw. I’m not sure I would change the alignment inside the slabs unless you are sure that your hardware doesn’t require aligned data…
good article! I was wondering how do you get / calculate the chunk size per slab? Just add 56 bytes and then round up the the smallest 8-byte aligned value for 32 bit systems?
ah, I guess should read the man better :)
memcached -vv on startup will print it
Hi, Thanks for article.
I have alot of very small objects and started to use memcached. Once it reveals many evictions but limix_maxbytes was not reached. So i concluded there is not enough memory and started to read about slabs.
I had factor 1.25 now set to 1.05 and 700Mb limit
Question 1: stats slabs
STAT 1:chunk_size 104
STAT 1:chunks_per_page 10082
STAT 1:total_pages 1
STAT 1:total_chunks 10082
STAT 1:used_chunks 10080
STAT 1:free_chunks 2
STAT 1:free_chunks_end 8964
STAT 2:chunk_size 112
STAT 2:chunks_per_page 9362
STAT 2:total_pages 1
STAT 2:total_chunks 9362
STAT 2:used_chunks 9362
STAT 2:free_chunks 0
STAT 2:free_chunks_end 6721
….. etc
chunk_size 104 – this is somewhere on compile stage? so i must recompile to decrease this value?
Question 2: each page is 1 Mb. chunks_per_page is how many chunks with size 104 i have. what total_pages means? and what is free_chunks_end?
Thanks in advance for reply,
Serguei
Also i have 144 slabs and in first no free chunks, in others 1-2-4 is free. but there is alot of memory available. what will happen when no free chunks will left? or it is stupid question?