1. Introduction
2. Linux related technology
3. Preparing the external disk
4. The Script

1. Introduction

Taking backups is important if you don't want to lose your work, digital photos etc. I propose a simple system that makes it easy to regularly take backups on one or more USB or eSATA external disks.

Keep in mind that a working backup should obey a few rules:

2. Linux related technology

2.1. Hard Links

A hard link is a second name for the same file. Think of it as two names for the same instance of a file.

Let us look at an example.

:$ echo a1 >> a.txt
:$ ln a.txt b.txt
:$ echo b1 >> b.txt
:$ echo a2 >> a.txt		
		

The first command creates a file named 'a.txt' and appends 'a1' to the (empty) content. The second command creates a hard link, that is an alternative name for the same file. Now we append 'a2' to the file named 'a.txt' and 'b' to the file named 'b.txt'.

Now, as it happens both names point to the same file, so if you open either of them, this is what you see:

:$ more a.txt
a1
b1
a2

As you can see, changing a.txt affects b.txt because it is the same file. We use hard links because we don't have to keep several identical duplicates of the same file.

2.2. Copying with rsync

The rsync tool is intended for copying files between two hosts in an efficient manner. It only sends data for files that have changed or are not yet present on the destination host.

We use rsync primarily because it is capable of

  • --delete causes rsync to delete files on the destination that are not present on the source
  • --link-dest $latest_link instructs to create hard links to files present in $latest_link instead of making a copy. This is what actually does the magic...
  • -a preserves most file properties like permissions, timestamps etc.
  • -H preserves hard links between two files on the source when copying to the destination.

3. Preparing the external disk

Before using the backup script, you need to have a file system on the backup disks that support hard links and soft links. Most USB disks are sold formatted with FAT32 - which is not sufficient for our purposes.

You can achieve this with the command line tool parted , or with the graphical frontend for it, GParted.

GParted - an application to format disks
GParted - an application to format disks

The script checks for a plugged in disk with a file system of type ext3 , so you need to format the disk as such or adapt the script if you want to use different file systems like ext4 or reiser4 .

4. The Script

The script contains a few variables that you can adapt as you see fit.

The script works by looking at the dynamically mounted disks. In Ubuntu this occurs at /media . When using with other Linux distributions, you might have to verify that.

Since the base directory is /home you need to run the script as super user. If you don't like that, just set the data_dir to your home directory.

An advantage of using this script is that file permissions are preserved too.

Since each file occurs only once: NEVER edit files directly on the backup disk. Copy them first to your normal hard disk.

Here is the script. Copy it to a file backup.sh and make it executable.

#!/bin/bash
############################################################################
#
# Adapt this script for your situation, in particular 'data_dir' and 
# 'subdir'
#
# Test first with some simple testdata!!!
#
# Removing the 'latest' link will trigger a full backup the next time, using the same 
# space as a complete copy. Incremental backups take far less space
#
# Manually triggered backup
# Target should be a (USB) disk mounted somewhere under "/media" with ext3 filesystem.
# The first disk found that does not start with cdrom or floppy, is used
# 
# Backup uses hardlinks. Remove the 'latest' symbolic link if you want a new copy (full backup)
# With hardlinks, each file exists only once on the disk.
# 
####################################################################################

data_dir="/home"
hostname="$(hostname -s)"
date_dir=$(date +%F-%k%M%S)
subdir="backup-incr"

#We only want ext3 filesystem supporting hardlinks
LIST=$(find -L /media -maxdepth 1 -type d -fstype ext3 -regex '/media/.*')

#Filter out cdrom and floppy, they are always present and not that relevant...
((count=0))
for name in $LIST
do
	if [[ $name != /media/cdrom* &&  $name != /media/floppy* ]]
		then
		FILTERED_LIST[ ((count++)) ]=$name
	fi
done

#Check we have at least one backup medium
if  [ $FILTERED_LIST ]
	then

	echo -e '\nAvailable media'
	PS3='Choose the backup media number (or type any other number to quit):'
	select name in ${FILTERED_LIST[*]} 
		do
			usbdisk=$name
				break
		done
	echo -e '\nyour choice: $usbdisk'

	if [ $usbdisk ]
		then
			backup_dir=$usbdisk/$subdir/$hostname/$date_dir
			latest_link=$usbdisk/$subdir/$hostname/latest

			echo
			echo '-----------------------------------------'
			echo '   Backup:'
			echo '     from:' $data_dir
			echo '       to:' $backup_dir
			echo '   latest:' $latest_link
			echo '-----------------------------------------'
			echo

			if test ! -d $data_dir
				then
					echo "ERROR: Data directory does not exist, stopping now"
				else
					echo "Data directory exists"

					mkdir -p $backup_dir
					if [ -h $latest_link ]
					  then
						echo "Doing incremental backup"
						cp -al $latest_link $backup_dir
						rsync -aH --delete --link-dest $latest_link $data_dir/ $backup_dir
						rm  $latest_link
					  else
						echo "Doing full backup"
						rsync -a --delete $data_dir/ $backup_dir
					fi

					ln -s $backup_dir $latest_link

					echo -e '\n\tbackup ended' $hostname/$date_dir 'at' $(date +%F-%k%M%S)
			fi

		else
			echo "No backup disk selected"
	fi
	else
		echo "No backup disk detected in /media (type ext3 - symbolic link allowed)"
fi