« 2017/09/15 - Backups » | scripting linux

I have a home server which I use to host my Perforce repo among other things. I built it roughly based on a DIY NAS design, but installed Ubuntu Server on it. This was my first major foray into Linux and whilst I would change plenty of things, the server works and if it ain't broke don't fix it. The only real problem was that I didn't have a backup solution.


This lack of a backup came to the fore a few months back when the ASRock C2750D4I that I use for the server died. Thankfully ASRock were very helpful and promptly RMA'ed and replaced the board, but when rebuilding the RAID array two of the disks had to be recovered. I didn't lose any data, but it was a close shave and reminded me of the need for a better backup system.

I have another NAS, a Synology DS1815+ that I use for general backups, but these are performed with scripts I run on my main workstation, using robocopy to pull down all the files via a SMB share and then upload them to the NAS. Full copy every time, no deltas, and obviously relies on me remembering to run the script periodically. And it's all on-site, if I get burgled or similar, it's all gone.


Goals

The backup system needed to fulfill the following:
* Off-site
* Automated
* Delta transfer files
* Driven by the home server, not the workstation
* Scripted setup & execution
Decisions

There are two broad ways to go about doing backups. Firstly is to pay a company like CrashPlan for access to their servers, point their client at your data and leave it to it. Secondly you roll your own.

Ultimately I don't have a lot of data, it's mostly code and invoices for my business, and I have no interest in backing up large media files or the like. My requirements are simple, we're talking less than 500MB of data, and not a huge number of files since I don't practice crazy file-for-every-class development strategies like some. I needed to be able to run a script when doing the backup, since I needed to shutdown my Perforce server, checkpoint it, do the backup, and then bring it back up.

After looking at some of the commercial offerings, mostly ranging in the $10/mo range and upwards, I decided to roll my own. Mostly I prefer to do my own thing anyway, I dislike vendor lock-in, and it gave me a further taste of shell scripting as I move further towards Linux. With my own scripts, should any part of my setup disappear or jack their prices, I can move with ease.

I still needed some off-site storage, but as long as I could copy files to it in a sensible manner, it didn't really matter in what form.

Storage

I briefly looked into hosted VMs and other storage-focused offerings, but decided instead that a dedicated server was more to my taste.


I picked up a Kimsufi KS-1 which gets you an Atom N2800 (dual core, 1.86GHz), 2GB RAM, 500 GB spindle storage and 100 Mbps unmetered network for €4.99+VAT pcm, which works out at about £4.60+VAT pcm in real money.

These servers are quite popular, and when they become available they sell out again within the hour, if not sooner. I signed up for a mailing list that sent me an email when my configuration of choice became available. On Friday evening exactly that happened, and by the time I'd purchased my KS-1 and refreshed the servers page, they'd sold out again.

They provide no guarantees for your data, a disk can die at any point and they just let you know so you can reinstall at your leisure. This isn't a problem for me since the server will be one of many backups; at minimum I'll back up to the workstation and local NAS, this is just covering the off-site angle.

Initial setup

Once I had my KS-1 installed I created a new user specifically for backups.

> sudo adduser {username}

Scripts

I needed to create the following scripts:

backup_setup.sh
Set up SSH access to the backup server, install public RSA keys, and set up the cron job on the local server.

backup.sh
Do the actual backup to the backup server, is run periodically by the cron job.

restore.sh
Restores files from the backup to the local server, for when the backup is needed to be used. Disaster recovery folks, test those backups.

Usage

Run backup_setup.sh on the local server that has the data you want to back up, this sets up automated running of backup.sh.
Run restore.sh on the local server when you need to restore data from the most recent backup.

Script snippets

As part of developing these scripts, I used the following snippets.

> ssh-keygen -R $server
Removes SSH trust for the specified $server.

> ssh-keyscan $server>>~/.ssh/known_hosts
Adds the specified $server to the known list of trusted servers, so you don't get the following prompt when connecting to a server:

The authenticity of host '{hostname} ({ip})' can't be established.
RSA key fingerprint is [key fingerprint].
Are you sure you want to continue connecting (yes/no)?
> ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ""
Creates a private and public RSA key with no passphrase (-N ""), so no prompting during creation nor use.

> ssh-copy-id -i ~/.ssh/id_rsa $username@$server
Connects to the backup server and installs our public key for password-less authentication. Needs the password the first time to install the key.

> ssh -oBatchMode=yes $username@$server "echo test">null
Tries to connect to the backup server without prompts (-oBatchMode=yes, if using OpenSSH), if $? is non-zero then it failed.

> ssh $username@$server "ls -1 -d $dstDir/* 2>nul | tail -n 1"
This searches the destination directory and grabs the path to the most recent backup.

> rsync --archive --delete --stats --link-dest=../$recentBackupName $srcDir/ $username@$server:$newBackupDir/
The meat of backup.sh.

$recentBackupName holds the name of the directory obtained in the snippet above. Note the lack of trailing slash.
$srcDir contains the working directory for the data to be backed up, in this case my Perforce directory. Note the trailing slash.
$newBackupDir contains the full path to the location to store the new backup in. Note the trailing slash.

The magic is the --link-dest option. This takes a relative path to another directory, in this case the previous backup, and any files it finds in that previous backup that match the new backup will be replicated as a hard link rather than a full copy of the data. This way, rsync only copies the files that have changed compared to your previous backup.

Hard links are great since there is only one copy of the backing data, and it will persist until all "names" of the file have been removed. Think of it like a ref-counted file if you're a programmer. I proved this was working with ls -i:

username@server:~/backups/p4$ ls -1
2017-09-03_18-23-07
2017-09-03_18-23-31
username@server:~/backups/p4$ ls -l 2017-09-03_18-23-07
total 8
-rwxrwxrwx 2 username username 1 Sep 3 19:17 a.txt
-rwxrwxrwx 1 username username 1 Sep 3 19:17 b.txt
username@server:~/backups/p4$ ls -l 2017-09-03_18-23-31
total 12
-rwxrwxrwx 2 username username 1 Sep 3 19:17 a.txt
-rwxrwxrwx 1 username username 2 Sep 3 19:18 b.txt
-rwxrwxrwx 1 username username 1 Sep 3 19:18 c.txt
username@server:~/backups/p4$ ls -i 2017-09-03_18-23-07/a.txt
12058635 2017-09-03_18-23-07/a.txt
username@server:~/backups/p4$ ls -i 2017-09-03_18-23-31/a.txt
12058635 2017-09-03_18-23-31/a.txt
username@server:~/backups/p4$ ls -i 2017-09-03_18-23-07/b.txt
12058637 2017-09-03_18-23-07/b.txt
username@server:~/backups/p4$ ls -i 2017-09-03_18-23-31/b.txt
12058639 2017-09-03_18-23-31/b.txt
username@server:~/backups/p4$ ls -i 2017-09-03_18-23-31/c.txt
12058640 2017-09-03_18-23-31/c.txt
username@server:~/backups/p4$
In this example I've:
* Left a.txt alone
* Changed b.txt
* Added c.txt
As you can see, the inode (the number before each file) for a.txt in both the 07 and 31 directories match, whereas the inodes for b.txt don't.