Monday, January 03, 2011

Setting up an S3 backup solution on a CentOS VPS

I wanted to expand on the built in backup facilities Rimuhosting provides, and I figured S3 would be the way to go.

Attempt #1: s3fs

My first attempt was to use s3fs to make my S3 bucket available, and then use rsync to copy to it. I read about the idea here and was sold.

The setup instructions in the original post didn't quite apply to the CentOS server I was using. Via a combination of Google and trial and error, I came up with the follow yum install commands:

sudo yum install fuse fuse-devel python-devel
sudo yum install curl-devel
sudo yum install libxml-devel
sudo yum install libxml2-devel
cd ~/util/src
wget http://s3fs.googlecode.com/files/s3fs-r191-source.tar.gz
tar xzf s3fs-r191-source.tar.gz
cd s3fs
sudo make install

I entered the command:

sudo mkdir /mnt/s3
sudo /usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3

And to my absolute amazement, it worked - I was able to trivially access my S3 bucket as though it were just another file system.

I kicked off an rsync command to test out the backup - and that's where I ran into the catch. It was so very s l o w. After a couple of minutes, only an itty bitty fraction of the 3 Gigs of data that needed to be backed up where in the S3 share.

Attempt #2: duplicity

Reading through the comments of the above approach, another user suggested duplicity as a work around for the sluggish S3FS behavior.

All it took was a simple:

 sudo yum install duplicity

and the duplicity was ready for use.

I found this HOWTO which gives specific tips about how to use duplicity over S3.

My final backup script wasn't far from what was shown there on that page. Here's what I arrived at:

#!/bin/bash

export PASSPHRASE=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

dirs="/var/svn \
     /var/www \
     /home \
     /usr/local/stuff"

for d in $dirs; do
  prefix=$(basename $d)
  echo duplicity $d s3+http://backup.bucket.name/$prefix
  duplicity $d s3+http://backup.bucket.name/$prefix
  echo ""
done

The script above is a little chatty for testing purposes.

I was able to backup about 4Gigs of data in about 45 minutes (or, what felt like 30 minutes - I lost track of time).

I like the duplicity approach for a variety of reasons:

  • The standard file formats (tar+GPG) make sense from a backup perspective
  • The incremental backup functionality means that backups should be a relatively quick affair going forward
  • Restoring files is quite easy. I did a quick test and had no problems with it
  • The app has a unix'ish feel to it - it does one thing and does it well

I've added duplicity to cron and we'll see tomorrow how my 3am backup goes.

All in all, this seems like a winning solution. I'll also keep S3FS around too, as it's a wonderfully clever way to access S3 buckets.

3 comments:

  1. I've been considering this myself, thanks for the write up for CentOS. Has this been stable, do you still advocate this solution?

    ReplyDelete
  2. Hey Paul - it's been working great for me. Highly recommended.

    ReplyDelete
  3. Nice article on backing up CentOS ben! Thanks!

    ReplyDelete