zfs send remote encrypted backup

I’ve been trying to finally move some of my file storage off site. Here’s a little script I wrote to help facilitate that.

zfs-backup:

#!/bin/sh

usage() {
        echo "Usage: "`basename $0`" snapshot-name";
        exit 1;
}

if [ "x$1" = "x" ]; then
        usage;
fi

if zfs list -t snapshot $1 > /dev/null 2>&1; then
        SNAP=$1
        SAN=`echo $1 | sed 's/[^A-Za-z0-9]/-/g'`
else
        echo "Invalid snapshot given.  Try zfs list -t snapshot for ideas.";
        usage;
fi

BASE=`basename $0`
FIFODIR=$(mktemp -d $BASE-tmp-XXXXXXXX) || exit 2
FIFO=$FIFODIR/$SAN
CHK=$FIFODIR/sha256

CONTAINER=$SAN.gz.sc

mkfifo $FIFO;

echo "Sending snapshot "$SNAP;

sha256 < $FIFO > $CHK &
zfs send "$SNAP" | pigz | scrypt enc /dev/stdin | tee $FIFO | ssh -c arcfour256 X_HOSTSPEC_X "umask 0077 && cat > .zfs-backup/$CONTAINER"

SHA256=`cat $CHK`
printf "%s  %s\n" $SHA256 $CONTAINER | ssh X_HOSTSPEC_X "umask 0077 && cat > .zfs-backup/$CONTAINER.sha256sum"

echo "Tranfered snapshot with checksum: "$SHA256;

rm $CHK;
rm $FIFO;
rmdir $FIFODIR;

Some notes about choices of utilities:

  • pigz could easily be replaced with gzip or lzma, or whatever.
  • I’m debating switching scrypt out for something like openssl or gpg with an actual random key, or possibly a curve25519 chacha20 poly1305 container, I haven’t done the research to see how smart/easy this is.  I understand what scrypt is doing, it’s installed on my machine, and It’s a Good Thing.
  • I’m using arcfour256 for the bulk transfer, because the security of the stream isn’t important.  It’s already protected by scrypt/AES256.
  • I tee to a fifo so that I can check that the transfer wasn’t corrupted on the remote end without typing my pass phrase into the untrusted machine.  The tee/fifo feels hackish to me, but I don’t have another idea.
    • I investigated the scrypt format, and there is no length in the file header, nor any tailing magic bytes, so it’s impossible to tell if the file is truncated without trying to decrypt the file.  Based on the code, adding a length header, or tailing magic would break the current on-disk format.
    • This checksum won’t help if, say, the scrypt process is interrupted – I’m guessing you will get a partial transfer, and matching checksums.
  • I copy the checksum to the remote machine also, in a format that can be parsed by sha256sum
  • I run this in a screen session

Leave a Reply

Your email address will not be published. Required fields are marked *