dar
All the programs we saw so far are really "online" backups. While tar is
certainly capable of being used in an offline mode (after all, the t in
tar stands for tape!), it's not used much for that now anyway. The other
two, being really mirroring programs rather than backup programs, cannot
work in an offline mode, and anyway the rsync algorithm needs the old copy to
be present somewhere to work its magic.
For the same reason, they require space equal to the size of the source area being mirrored. And rdiff-backup requires even more, because it keeps older versions around also, though it is phenomenally efficient at storing diffs so you may not notice this too much.
Finally, they're not encrypted, and not compressed, because they're mirrors.
What if you wanted encryption, or compression of some kind? Well you can always do this:
But what if you don't have enough spare disk capacity to hold even a compressed copy of all your files, either locally or on a connected machine?tar jcvf - sources | openssl enc -bf > sources.tar.bz2.enc
Or you need to (or want to) back up to a bunch of CDs or DVDs so you can store them in another location?
Or really do a full backup once a month, but the rest of the time you just want to do a quick differential backup (only files that changed since the last full backup).
Oh, and while you're about it, you want to store the catalog of files locally so you'll always know which file is in what disk! No trawling through 40 CDs trying to find the file you want -- search the local catalog, and load just the one CD you need!
That's when dar comes in. No prizes for guessing what the d stands
for :-)
"dar -h" gives you a pretty good summary of the options.
dar -K: -y -c /bigfs/dar-full -R sources -g dir-to-save -P dir-to-save/subdir-not-to-save
The -K: option (note the colon is part of the option) specifies
blowfish encryption, and the -y specifies bzip2 compression.
The -R is called the "root". If -R is not given it defaults to
the current directory.
The -g ("go-into", or include) and -P ("prune", or exclude)
directories must be relative to the root. You can repeat these two
options as many times as you want. The order in which they are specified
don't matter -- any candidate file/dir which is matched by a -P
directive is excluded.
If you don't have any -g directives at all, the entire root is
considered for archival. If you have any -g directives, only those
will be considered.
/bigfs/dar-full.1.dar. The 1 is a "slice number", which refers
to dar's feature of splitting up an archive over several media if you need
to do that. Even if you have only one slice it gets a number.
-s option is used to specify a slice size. To split your archive
into CD-ROM sized chunks and pause between creating each slice to give you
time to burn a CD, use -s 700M -p.
-G
catalog-file. To do this later, use -C catalog-file -A
dar-full.
dar -l. When listing a differential backup,
the -as option gives you only the files that are actually present in
this file. [Remember that a differential backup contains only files that
have changed since the last full backup. Dar's differential backups do
contain a full copy of the metadata, so even if nothing has changed
since the last backup, you'll use some space for the metadata.]
dar -x.
.1.dar part when you provide a dar
archive name in a dar command. The dar command only wants the base name
(which applies to the entire archive), not the filename of the first
slice.
dar -K: -y -c /bigfs/dar-diff -A /bigfs/dar-full [...rest of command as before...]
-I option helps you to get just the
files you want, without worrying about the full path.
Long term backups on removable media or on untrusted computers, is the best use for this.
[ If I trust the computer and I have enough space I'd prefer to use rdiff-backup because then the "full" backup is the current version, and differentials go backwards in time instead of the opposite -- a feat which is only possible because rdiff-backup is an online backup solution :-) ]
dar archives are directly accessible -- a file at the end of
a large archive takes the same time to restore as a similar-size file at
the start of the archive. Plain tar has to read the entire archive to get
to a file at the end, as you probably know.
This has implications for security (each individual file is encrypted, not the whole archive in one shot) and for space (compressing many files separately is not as efficient as compressing the whole archive in one shot).
The purpose of doing it this way is so that you don't have to load all 10
CDs to get back a file from the middle. dar will let you get away
with loading just 2 or 3 of them.
You can also treat a dar archive like some sort of "near-line" storage! I
often do that to /usr/share/doc :-)
kdar. It doesn't support all
the options of dar, but it does support the most common ones.
However, I don't really like it very much. The only thing it can do that command line dar cannot is (duh!) show you a graphical tree of an archive file and let you interactively diff/test/restore one/some files.
So that's all I use it for.
dar allows you to keep your favourite options in a file called
.darrc in your home directory. Options can be specified for
each of the major operation modes. Mine looks like this:
create:
# I care more about preserving ctime than atime
-aa
# bzip2 compress
-y
# I dont care about extended attributes in user...
-u
# ...and root namespaces
-U
# minimum 512 bytes in order to compress a file
-m 256
# ignore case for following flags
-an
# do not compress following extensions, case insensitive due to above
-Z "*.avi"
-Z "*.bz2"
# -Z "*.gif"
-Z "*.gz"
-Z "*.jar"
-Z "*.jpeg"
-Z "*.jpg"
-Z "*.mov"
-Z "*.mp3"
-Z "*.mpg"
# -Z "*.png"
-Z "*.tgz"
-Z "*.zip"
# set case sensitivity back to default
-acase
extract:
# use -O only for ordinary users' ~/.darrc, not for root
-O
-u
-U
-B option. Here's an extract from one of my files, showing a few
-g and a --prune directives. (I prefer to use --prune rather
than -P in these files because it stands out more).
-g /var/www
-g /var/spool/cron
-g /var/named
--prune /var/mailman/archives
-g /var/mailman
So when you attempt to restore a file from a backup set, like below,
dar -x /mnt/cdrom/2006-05-04-full -g file1 -g file2
you have to mount the CD on which the first slice of the May 4 full
backup is stored, i.e., the CD on which a file called
2006-05-04-full.1.dar is present.
When this is done, dar will ask for the CD containing the last slice (you can mark your CDs as 2006-05-04-full, 1/5, then 2/5, etc. so you know which is the last one).
These two slices (the first and last slice) are always needed by dar for any restore, because of how it stores the metadata internally. If the files you want are entirely contained within one or both of these slices, you're done.
Otherwise dar will prompt you to load the CD with, say, slice number 3, so you can load that.
To help in this, dar has the concept of "isolating" (I prefer to call
it "extracting") the directory data from the actual archive with the
-C option. This is, of course, much, much, smaller, and you can
keep the metadata for all of your archives on disk, without losing a
lot of space.
You can then query each of the catalogs using -as -l to see which
one has the file you want. Let's say you find it in
2006-08-12-diff; you then mount the CD that contains the first
slice of that backup set.
[You are labelling your CDs as soon as you write them, aren't you?]
dar_manager whose purpose is to eliminate the drudgery in
the above step. But this article is already far too long, so I'll
just give the commands with a very brief explanation:
dar_manager -C my_dmd
creates the dar_manager database. This is done once only.
dar_manager -B my_dmd -A 2006-07-01-full_catalog
updates the database with the catalog for the July 1st full backup. You will have to run a similar command for every backup you take, but you would just add that to the script so it's automated.
dar_manager -B my_dmd -r files-to-restore
This is where all the magic happens. Just mount the CDs it asks
for when it asks for them and hit enter. Dar will always ask for
the first and last slices of any multi-slice archive, due to its
design, but slices in between will only be asked if they contain
the file you want.
- One final tip: I haven't tried this yet, but I think you should make
the slice size half the CD size (say 350M) and burn the first and last
slices onto one CD. This ought to eliminate one extra CD mount!
Well, the next chapter: Synchronisation -- when files change on both sides