I wanted to expand on the built in backup facilities Rimuhosting provides, and I figured S3 would be the way to go.
Attempt #1: s3fs
My first attempt was to use s3fs to make my S3 bucket available, and then use rsync to copy to it. I read about the idea here and was sold.
The setup instructions in the original post didn't quite apply to the CentOS server I was using. Via a combination of Google and trial and error, I came up with the follow yum install commands:
sudo yum install fuse fuse-devel python-devel sudo yum install curl-devel sudo yum install libxml-devel sudo yum install libxml2-devel cd ~/util/src wget http://s3fs.googlecode.com/files/s3fs-r191-source.tar.gz tar xzf s3fs-r191-source.tar.gz cd s3fs sudo make install
I entered the command:
sudo mkdir /mnt/s3 sudo /usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3
And to my absolute amazement, it worked - I was able to trivially access my S3 bucket as though it were just another file system.
I kicked off an rsync command to test out the backup - and that's where I ran into the catch. It was so very s l o w. After a couple of minutes, only an itty bitty fraction of the 3 Gigs of data that needed to be backed up where in the S3 share.
Attempt #2: duplicity
Reading through the comments of the above approach, another user suggested duplicity as a work around for the sluggish S3FS behavior.
All it took was a simple:
sudo yum install duplicity
and the duplicity was ready for use.
I found this HOWTO which gives specific tips about how to use duplicity over S3.
My final backup script wasn't far from what was shown there on that page. Here's what I arrived at:
#!/bin/bash export PASSPHRASE=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX dirs="/var/svn \ /var/www \ /home \ /usr/local/stuff" for d in $dirs; do prefix=$(basename $d) echo duplicity $d s3+http://backup.bucket.name/$prefix duplicity $d s3+http://backup.bucket.name/$prefix echo "" done
The script above is a little chatty for testing purposes.
I was able to backup about 4Gigs of data in about 45 minutes (or, what felt like 30 minutes - I lost track of time).
I like the duplicity approach for a variety of reasons:
- The standard file formats (tar+GPG) make sense from a backup perspective
- The incremental backup functionality means that backups should be a relatively quick affair going forward
- Restoring files is quite easy. I did a quick test and had no problems with it
- The app has a unix'ish feel to it - it does one thing and does it well
I've added duplicity to cron and we'll see tomorrow how my 3am backup goes.
All in all, this seems like a winning solution. I'll also keep S3FS around too, as it's a wonderfully clever way to access S3 buckets.
I've been considering this myself, thanks for the write up for CentOS. Has this been stable, do you still advocate this solution?
ReplyDeleteHey Paul - it's been working great for me. Highly recommended.
ReplyDeleteNice article on backing up CentOS ben! Thanks!
ReplyDelete