EC2 reliable backup

| | | | |

Still looking, searching, reading for some way to have good, decent and often backups of the machine because if the power goes down or something happens, when the instance goes down .. you lose the data inside the instance.

Saw that there are threads about this on amazon's aws ec2 forum. Amazon's staff say that they are thinking about this and how they could provide either a database service (mysql or something else) or some kind of really useful and permanent storage (like the 160GB /mnt partition, but that doesn't wipe when your instance goes down).

The service that is already available and everybody is expecting to provide persistent storage is Amazon's S3 ( Simple Storage Service ). It's kind of a huge and cheap storage service, a walkthrough is available here. The problem for the time being is that there are no stable linux implementations.

A solution would be the s3 / fuse interface. But that seems to be in alpha state now (btw, huge thread on aws about this). This could be a solution to many problems ... but it will take a while until it's done.

Another project in alpha state is s3DAV. Maybe using this and davfs2 could be another way.

I'm not a backup guru so i'll just keep reading, trying to find out as much as i can but in the mean time i'm thinking of making a system snapshot as an image every 24 hours ,uploading it to S3 and keeping the most recent 4 snapshots as this would be appropiate for my new system (the compressed image has around 400MB and of course besides the blog the server is empty; this wouldn't be ok on a 30GB database server or whatever live system with some amount of data, but for me it's ok now). I have attached a script i made for this purpose.

This is only best practice for now and has too many downsides for it, but until i figure out something better i'll stick to this.

Another simple way would be to make some scripts that once an hour do a mysqldump --all-databases , copy the /var/log/* and upload this on S3 and have some scripts that when the instance boots, download the latest backups from s3 (the database part .. because the logs to be copied back are pointless) .
I'm trying to achieve this without using other machines or other services from other providers, because if i did this, the whole picture would change, and i wouldn't be using just EC2 and S3, but other third party services and cost would go up and i would start thinking about not using EC2.

AttachmentSize
image_backup.sh1.25 KB

I'm thinking that i would

I'm thinking that i would really want some kind of network file system with EC2, probably in the final release of this service they are going to provide something like this ( or hope that the s3/fuse interface will be at least in beta stage soon).
I'm still trying to find out a productive and useful way of makeing regular backups. The image snapshot script works ok for now (still it needs some improvements) but this isn't a real solution.
If anyone has ideeas, please fill in.

So, in the mean time i

So, in the mean time i searched for s3 tools, and found a bunch of them, i'll even try to keep a list.

As the system will grow (in size because of the sites and databases) having a system snapshot often will be a painful and resource intensive task, so probably i would make a snapshot once a week or even once a month because of system updates and software updates, otherwise it is pointless as i need another backup system for the database and other content that changes often.
So, for the content now i'm using s3sync.rb. I'm using it to backup once an hour (from a cron job) the /var/www,/var/log and /var/lib/mysql on S3.
I'll need to do some tricks for /var/lib/mysql, i'll start another mysql instance that will sync with the master on the same machine and when the cron job starts, it will stop the slave mysql server, backup to slave mysql data dir and start after the backup is done.

Beside storing them on S3, for the system logs i'm thinking on doing remote logging to another machine (the one outside amazon that i'm using as a dhis server). This is required as if the machine(in fact the EC2 instance) crashes i would really need the logs from the last minutes before the crash and those as those won't be backed up to S3 exactly before the crash :) .

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
More information about formatting options