1. Introduction
Taking backups is important if you don't want to lose your work, digital photos etc. I propose a simple system that makes it easy to regularly take backups on one or more USB or eSATA external disks.
Keep in mind that a working backup should obey a few rules:
- Make backups regularly
- Keep your backups 'off line'
- Keep a backup on a different location
- Tested - verify that you can read or restore the backup
2. Linux related technology
2.1. Hard Links
A hard link is a second name for the same file. Think of it as two names for the same instance of a file.
Let us look at an example.
:$ echo a1 >> a.txt :$ ln a.txt b.txt :$ echo b1 >> b.txt :$ echo a2 >> a.txt
The first command creates a file named 'a.txt' and appends 'a1' to the (empty) content. The second command creates a hard link, that is an alternative name for the same file. Now we append 'a2' to the file named 'a.txt' and 'b' to the file named 'b.txt'.
Now, as it happens both names point to the same file, so if you open either of them, this is what you see:
:$ more a.txt a1 b1 a2
As you can see, changing a.txt affects b.txt because it is the same file. We use hard links because we don't have to keep several identical duplicates of the same file.
2.2. Copying with rsync
The rsync tool is intended for copying files between two hosts in an efficient manner. It only sends data for files that have changed or are not yet present on the destination host.
We use rsync primarily because it is capable of
- --delete causes rsync to delete files on the destination that are not present on the source
- --link-dest $latest_link instructs to create hard links to files present in $latest_link instead of making a copy. This is what actually does the magic...
- -a preserves most file properties like permissions, timestamps etc.
- -H preserves hard links between two files on the source when copying to the destination.
3. Preparing the external disk
Before using the backup script, you need to have a file system on the backup disks that support hard links and soft links. Most USB disks are sold formatted with FAT32 - which is not sufficient for our purposes.
You can achieve this with the command line tool parted , or with the graphical frontend for it, GParted.
The script checks for a plugged in disk with a file system of type ext3 , so you need to format the disk as such or adapt the script if you want to use different file systems like ext4 or reiser4 .
4. The Script
The script contains a few variables that you can adapt as you see fit.
The script works by looking at the dynamically mounted disks. In Ubuntu this occurs at /media . When using with other Linux distributions, you might have to verify that.
Since the base directory is /home you need to run the script as super user. If you don't like that, just set the data_dir to your home directory.
An advantage of using this script is that file permissions are preserved too.
Since each file occurs only once: NEVER edit files directly on the backup disk. Copy them first to your normal hard disk.
Here is the script. Copy it to a file backup.sh and make it executable.
#!/bin/bash ############################################################################ # # Adapt this script for your situation, in particular 'data_dir' and # 'subdir' # # Test first with some simple testdata!!! # # Removing the 'latest' link will trigger a full backup the next time, using the same # space as a complete copy. Incremental backups take far less space # # Manually triggered backup # Target should be a (USB) disk mounted somewhere under "/media" with ext3 filesystem. # The first disk found that does not start with cdrom or floppy, is used # # Backup uses hardlinks. Remove the 'latest' symbolic link if you want a new copy (full backup) # With hardlinks, each file exists only once on the disk. # #################################################################################### data_dir="/home" hostname="$(hostname -s)" date_dir=$(date +%F-%k%M%S) subdir="backup-incr" #We only want ext3 filesystem supporting hardlinks LIST=$(find -L /media -maxdepth 1 -type d -fstype ext3 -regex '/media/.*') #Filter out cdrom and floppy, they are always present and not that relevant... ((count=0)) for name in $LIST do if [[ $name != /media/cdrom* && $name != /media/floppy* ]] then FILTERED_LIST[ ((count++)) ]=$name fi done #Check we have at least one backup medium if [ $FILTERED_LIST ] then echo -e '\nAvailable media' PS3='Choose the backup media number (or type any other number to quit):' select name in ${FILTERED_LIST[*]} do usbdisk=$name break done echo -e '\nyour choice: $usbdisk' if [ $usbdisk ] then backup_dir=$usbdisk/$subdir/$hostname/$date_dir latest_link=$usbdisk/$subdir/$hostname/latest echo echo '-----------------------------------------' echo ' Backup:' echo ' from:' $data_dir echo ' to:' $backup_dir echo ' latest:' $latest_link echo '-----------------------------------------' echo if test ! -d $data_dir then echo "ERROR: Data directory does not exist, stopping now" else echo "Data directory exists" mkdir -p $backup_dir if [ -h $latest_link ] then echo "Doing incremental backup" cp -al $latest_link $backup_dir rsync -aH --delete --link-dest $latest_link $data_dir/ $backup_dir rm $latest_link else echo "Doing full backup" rsync -a --delete $data_dir/ $backup_dir fi ln -s $backup_dir $latest_link echo -e '\n\tbackup ended' $hostname/$date_dir 'at' $(date +%F-%k%M%S) fi else echo "No backup disk selected" fi else echo "No backup disk detected in /media (type ext3 - symbolic link allowed)" fi