trick – domas mituzas

on copying files

After hitting stupid kernel bug/behavior with rsync (um, memory pressure on socket caused to do some expensive cleanups on every packet) I ended up using ‘nc’ for reimaging slaves and copying files from server X to server Y:

dbX# nc -l -p 8888 | tar xf -
dbY# tar cf - . | nc dbX 8888

This would happily sustain gigabit traffic, and I guess could easily quadruple on 10GE without too much effort.

The only remaining problem with this method is copying from single source to multiple destinations. There are some tools that use multicast for sending out data streams, but due to lossy nature there apparently corruptions happen. I figured out there should be some easy way – chaining ‘nc’ commands. The only problem – I didn’t know how to do that in practice, until helpful people in #bash@freenode assisted:

dbX# nc -l -p 8888 | tar xf -
# cat >/dev/null serves as fallback in
# case some intermittent I/O errors happen
dbY# nc -l -p 8888 | tee >( tar xf -; cat >/dev/null ) | nc dbX 8888
dbZ# tar cf - . | nc dbY 8888

End result – with networks being full-duplex one can daisy-chain as many servers as needed, and single stream would be extracted on all of them :-) And for the curious, >(command) does this (from bash manual):

Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming
open files. It takes the form of (list). The process list is run with its input or output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to the current command as the result of the expansion. If the >(list) form is used, writing to the file will provide input for list.

popping a stack in mysql

Warning: this post describes practices that rely on side effects and may provide unstable results.

I was asked how to treat a table in MySQL as a stack and pop oldest entry from it. Of course there is always traditional approach – you lock table, fetch oldest record, delete it, commit. It isn’t that fun, though. This is what I came up to (and had multiple corrections):

mysql> delete from jobs where (@id:=jobs_id) limit 1; select @id;
Query OK, 1 row affected (0.00 sec)

+------+
| @id  |
+------+
| 3    |
+------+
1 row in set (0.00 sec)

mysql> delete from jobs where (@id:=jobs_id) limit 1; select @id;
Query OK, 1 row affected (0.00 sec)

+------+
| @id  |
+------+
| 4    |
+------+
1 row in set (0.00 sec)

In case of transactions used, you’d need to commit as fast as you can, as DELETE would acquire locks (but that’s the price of ACID ;-) – concurrent pops are not that easy.

Appending to stack is easy though – just.. INSERT. If you do not want transactions, you would still want InnoDB and that is where side issues kick in.

InnoDB stores all records clustered together with primary key, so table scans also happen in order of PK. If you would apply ORDER BY, there would be no guarantee that MySQL would not do filesort (if it would always pick indexes, this method would be even more awesome ;-). In case of filesort, last @a value would be set before actually sorting the data set, so a different value would be returned than deleted.

MyISAM doesn’t care at all about order, unless you specify it to. Doing the pop trick without ORDER BY would treat the stack just as pool of records and would delete any single row (and return it of course). Adding ORDER BY would set @values again to pre-filesort state.

That reminds me another trick (has been floating around for a while), that allows atomic incrementing of a field and fetching a value:

mysql> update counters set counter_value=(@val:=counter_value+1); select @val;
    -> update counters set counter_value=(@val:=counter_value+1); select @val;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
+------+
| @val |
+------+
| 562  |
+------+
1 row in set (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
+------+
| @val |
+------+
| 563  |
+------+
1 row in set (0.00 sec)

Yet another warning: As it’s modifying same row, only one transaction would be able to edit the record. So this should be done either not in transaction, or on HEAP/MyISAM/Blackhole engines.

ip address searches in mysql

MySQL does not provide special datatype or operators for IP addresses, but has two functions, that convert IP addresses to and from integers. In order to lookup addresses in a subnet, an efficient index range scan queries could be used.

SELECT ip FROM table WHERE ip BETWEEN
	INET_NTOA("213.197.137.0") AND
	INET_NTOA("213.197.137.255")

In order to make it easier, we can have several handy functions, that simplify the task a bit:

SELECT ip FROM table WHERE ip BETWEEN
	inet_aton_net("213.197.137.3/24") AND
	inet_aton_bc("213.197.137.3/24")

I intentionally didn’t create/use function that would check IPs existence in network, as optimizer wouldn’t know it can be converted to efficient range scan, though, some operators like… LIKE are optimized properly.

CREATE FUNCTION inet_aton_net (ip VARCHAR(18))
	RETURNS INTEGER UNSIGNED
	DETERMINISTIC
BEGIN
	DECLARE nm INTEGER UNSIGNED DEFAULT 0;
	SET nm=32-SUBSTRING(ip,LOCATE("/",ip)+1);
RETURN
	-- shift right and left looses host bits
	INET_ATON(LEFT(ip,LOCATE("/",ip)-1)) >> nm << nm;
END

CREATE FUNCTION inet_aton_bc (ip VARCHAR(18))
	RETURNS INTEGER UNSIGNED
	DETERMINISTIC
BEGIN
	DECLARE nm INTEGER UNSIGNED DEFAULT 0;
	SET nm=SUBSTRING(ip,LOCATE("/",ip)+1);
RETURN
	-- ip ORed with inverse netmask provides broadcast address
	INET_ATON(LEFT(ip,LOCATE("/",ip)-1)) | (4294967295>>nm);
END

	markcallaghan (@mark… on MySQL does not need SQL
	markcallaghan (@mark… on MySQL does not need SQL
	Domas Mituzas on MySQL does not need SQL
	Marc on MySQL does not need SQL
	Nils Meyer on linux memory management for…