on copying files

After hitting stupid kernel bug/behavior with rsync (um, memory pressure on socket caused to do some expensive cleanups on every packet) I ended up using ‘nc’ for reimaging slaves and copying files from server X to server Y:

dbX# nc -l -p 8888 | tar xf -
dbY# tar cf - . | nc dbX 8888

This would happily sustain gigabit traffic, and I guess could easily quadruple on 10GE without too much effort.

The only remaining problem with this method is copying from single source to multiple destinations. There are some tools that use multicast for sending out data streams, but due to lossy nature there apparently corruptions happen. I figured out there should be some easy way – chaining ‘nc’ commands. The only problem – I didn’t know how to do that in practice, until helpful people in #bash@freenode assisted:

dbX# nc -l -p 8888 | tar xf -
# cat >/dev/null serves as fallback in
# case some intermittent I/O errors happen
dbY# nc -l -p 8888 | tee >( tar xf -; cat >/dev/null ) | nc dbX 8888
dbZ# tar cf - . | nc dbY 8888

End result – with networks being full-duplex one can daisy-chain as many servers as needed, and single stream would be extracted on all of them :-) And for the curious, >(command) does this (from bash manual):

Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming
open files. It takes the form of (list). The process list is run with its input or output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to the current command as the result of the expansion. If the >(list) form is used, writing to the file will provide input for list.

3 thoughts on “on copying files”

  1. Just to avoid someone having to spend a few hours restoring from backups; The first example would copy the files from server Y to server X, not the other way around.

Comments are closed.

%d bloggers like this: