A real backup should be usable when your file system, your entire disk, your entire computer, or (God forbid) your entire building, goes down/up in flames :-) Which means the backup should ideally be remote. (Again, I'm not counting tape backup systems and such like here -- this is for individual users, as I said at the start of the first article).
The first real tool we will discuss is called rsync, which is a mirroring
tool. Usually, rsync mirrors are created on another machine, over a secure
channel like ssh. If you are mirroring to the same machine, it should at
least be on a different hard disk in order to get some protection from
hardware failures. However, even mirroring to the same hard disk is of
some use -- at least it helps against human error or a software bug wiping
out some file.
rsync is extremely efficient in using network bandwidth. See the
additional notes section at the bottom of this article for details.
(The following assumes a directory called sources sitting in your current
directory on the local machine, and being mirrored to the home directory on
the remote machine)
rsync is best used in a manner similar to cp or scp.
rsync -avz sources user@remote:
You can also add the --progress option if you have some particularly
large files and want to see a progress indicator
cp or scp
rsync -avz user@remote:sources .
path/*.
Otherwise it works pretty much the same as cp.
rsync -avz sources/ user@remote:
(WRONG) takes the files withinsources(thinksources/*) and puts them in the home directory of the remote user. Doesn't create a subdirectory calledsourcesthere! Probably NOT what you expected or wanted!If you remove the
/it will work as intended.
rsync -avzP user@remote:sources sources
(WRONG) makes a copy of the remote directorysourcesinside the local directory calledsources. Again, probably NOT what you expected or wanted.Note that
cpalso works like this, so it's not really a surprise or a design fault. I'm just noting it for completeness.
rsync -avzP user@remote:sources/ sources
(RIGHT) takes the files withinsourceson the remote, and puts them into the localsources. This is probably what you wanted.
rsync -avzP user@remote:sources .
(RIGHT) also works fine. It's putting the remotesourcesinto the current directory, within which (as we said) is your localsourcesdirectory. This is almost the same as the above command.
The rsync algorithm is an absolutely wonderful piece of work, and is actually part of Andrew Tridgell's (of the Samba team) PhD thesis. It's one of those algorithms that give you a warm glow of satisfaction when you read and understand it!
Just think about this: you have a large file locally, and an older version of the same file on a remote machine. rsync manages to send only the changes (plus a little administrative overhead) without having both copies in one place to actually do a "diff" operation!
What it's really doing is a "diff" of two files on different computers, without completely copying either of the files to the other side! Until you read the algorithm, this doesn't even seem possible :-)
No other algorithm is even close to its efficiency when you have large files
with only small changes (that is, the number of bytes changed is far, far less
than the size of the file). As a result, almost all decent "online" backup
programs now use this algorithm -- two of the three programs discussed next in
this series (rdiff-backup and unison) certainly do. [The third one is
an offline-capable backup solution so it cannot use this algorithm anyway --
the rsync algorithm needs both old and new versions to be online, even if
they are not both on the same machine.]
Despite all this, I use rsync only when I want an exact mirror, with no
extra files. For backup purposes, rdiff-backup (next article) is much
better, because it not only gives you a mirror, but also maintains older
versions as compressed reverse diffs, in an extra directory called
rdiff-backup-data.
Well, the next chapter: Beyond rsync --
rdiff-backup