rsync -- rdiff-backup
dar
This is the first of a series of articles on backup, mirroring, and synchronisation tools for ordinary users of Linux. I was replying to a simple query from a friend who is fairly comfortable at the command line now, but occasionally needs help. Since he lives 2 continents away, I started putting in more and more detail, until I finally realised I was actually writing these articles!
These articles are for individual users or single machines, not for large or multi-user systems which may need tape drives and so on. Also, I will not be going into download/installation issues; the articles are already too long, and most modern Linux distributions make it fairly easy to download a package with dependencies and install it.
One of the things about open source is the amount of choice you have. Choice is good, but it's even better when you know what tool to choose when. I hope these articles help.
We'll start with some basic definitions.
The source is the set of files that are to be backed up.
A backup is a copy of the source. The copy may be compressed to save space, or encrypted for security. It may be in one huge file kept on another machine or on a different hard disk on the same machine, or split into chunks and stored on multiple CDs or DVDs.
A backup stored elsewhere and not accessible immediately is offline, and is sometimes also called an archive. However, a more common and modern definition of archive is a single file that contains multiple directories and files within it, usually for ease of transporting. In this sense, the common ZIP file is a compressed archive.
A backup usually refers to the latest copy of the sources, but some backup tools can save older versions. The ability to "rollback" when needed is very nice, even if we can't do it for our lives :-)
A mirror is a special type of backup. It is a complete copy of the source, on the same machine or on another, with the directory structure, file permissions, etc., replicated exactly. In fact, one use of a mirror is that if the main source dies, the mirror can (in principle) substitute for it without any fuss.
A mirror is usually online, which means it is available for restoring at any time. Since a mirror is on disk, it can also be kept up-to-date much more easily and conveniently than offline backups.
Please do not confuse this with mirroring as used in RAID systems, or
mirroring as a feature of, say, Oracle. Those are special systems that are
meant to keep a mirror synchronised in real time or almost real time. We're
going to stick to stuff that stills needs to be started from a command, even
if the command is in cron.
Until now we have been talking about one source, and one or more copies. The copies change only when the source changes, not independently.
When there are two sources, each changing independently, they must periodically be synchronised. In this case, changes go both ways, not just one way. Handling this is more complex due to the need to detect and handle conflicts, which is what happens when the same file is changed in different ways on different sources.
Finally, while everyone knows what a full backup is, people often get confused between an incremental backup and a differential backup. So let's set the record straight once for all.
A differential backup is a backup of only those files that were changed since the last full backup. An incremental backup is a backup of only those files that were changed since the last backup -- regardless of what type of backup it was.
Let's say you take a full backup every Sunday, and let us designate the set of
files that changed on each following day as FMo, FTu, etc. The files
that are backed up on succeeding days, in each case, look like this:
Incremental Differential
Monday FMo FMo
Tuesday FTu FMo, FTu
Wednesday FWe FMo, FTu, FWe
Thursday FTh FMo, FTu, FWe, FTh
Friday FFr FMo, FTu, FWe, FTh, FFr
So you see, a differential backup gets bigger and bigger as you go further
away from the last full backup.
But if you want to restore stuff, you only need the last full backup and the last differential backup.
On the other hand, an incremental backup is much faster and more consistent in size, but if you have to restore something, you'll need the last full backup and ALL the subsequent incremental backups.
Well, the first chapter: Simple, quick, short-term, "just in case" backups