A true backup solution -- dar

All the programs we saw so far are really "online" backups. While tar is certainly capable of being used in an offline mode (after all, the t in tar stands for tape!), it's not used much for that now anyway. The other two, being really mirroring programs rather than backup programs, cannot work in an offline mode, and anyway the rsync algorithm needs the old copy to be present somewhere to work its magic.

For the same reason, they require space equal to the size of the source area being mirrored. And rdiff-backup requires even more, because it keeps older versions around also, though it is phenomenally efficient at storing diffs so you may not notice this too much.

Finally, they're not encrypted, and not compressed, because they're mirrors.

What if you wanted encryption, or compression of some kind? Well you can always do this:

tar jcvf - sources | openssl enc -bf > sources.tar.bz2.enc

But what if you don't have enough spare disk capacity to hold even a compressed copy of all your files, either locally or on a connected machine?

Or you need to (or want to) back up to a bunch of CDs or DVDs so you can store them in another location?

Or really do a full backup once a month, but the rest of the time you just want to do a quick differential backup (only files that changed since the last full backup).

Oh, and while you're about it, you want to store the catalog of files locally so you'll always know which file is in what disk! No trawling through 40 CDs trying to find the file you want -- search the local catalog, and load just the one CD you need!

That's when dar comes in. No prizes for guessing what the d stands for :-)

How is it used?

"dar -h" gives you a pretty good summary of the options.

For both listing and extracting, the -I option helps you to get just the files you want, without worrying about the full path.

When is it useful?

Long term backups on removable media or on untrusted computers, is the best use for this.

[ If I trust the computer and I have enough space I'd prefer to use rdiff-backup because then the "full" backup is the current version, and differentials go backwards in time instead of the opposite -- a feat which is only possible because rdiff-backup is an online backup solution :-) ]

What are the downsides?

Additional notes

  1. Files in dar archives are directly accessible -- a file at the end of a large archive takes the same time to restore as a similar-size file at the start of the archive. Plain tar has to read the entire archive to get to a file at the end, as you probably know.

    This has implications for security (each individual file is encrypted, not the whole archive in one shot) and for space (compressing many files separately is not as efficient as compressing the whole archive in one shot).

    The purpose of doing it this way is so that you don't have to load all 10 CDs to get back a file from the middle. dar will let you get away with loading just 2 or 3 of them.

    You can also treat a dar archive like some sort of "near-line" storage! I often do that to /usr/share/doc :-)

  2. For KDE users, there is a GUI called kdar. It doesn't support all the options of dar, but it does support the most common ones.

    However, I don't really like it very much. The only thing it can do that command line dar cannot is (duh!) show you a graphical tree of an archive file and let you interactively diff/test/restore one/some files.

    So that's all I use it for.

  3. dar allows you to keep your favourite options in a file called .darrc in your home directory. Options can be specified for each of the major operation modes. Mine looks like this:

        create:
        # I care more about preserving ctime than atime
        -aa
        # bzip2 compress
        -y
        # I dont care about extended attributes in user...
        -u
        # ...and root namespaces
        -U
        # minimum 512 bytes in order to compress a file
        -m 256
        
        # ignore case for following flags
        -an
        # do not compress following extensions, case insensitive due to above
        -Z "*.avi"
        -Z "*.bz2"
        # -Z "*.gif"
        -Z "*.gz"
        -Z "*.jar"
        -Z "*.jpeg"
        -Z "*.jpg"
        -Z "*.mov"
        -Z "*.mp3"
        -Z "*.mpg"
        # -Z "*.png"
        -Z "*.tgz"
        -Z "*.zip"
        
        # set case sensitivity back to default
        -acase
        
        extract:
        # use -O only for ordinary users' ~/.darrc, not for root
        -O
        -u
        -U
        
  4. If your file list is too large to be put on the command line, you can maintain the list as a file, and "include" that at the command line with the -B option. Here's an extract from one of my files, showing a few -g and a --prune directives. (I prefer to use --prune rather than -P in these files because it stands out more).

        -g /var/www
        -g /var/spool/cron
        -g /var/named
        --prune /var/mailman/archives
        -g /var/mailman
        
  5. If you have stored your data in CDs or DVDs, and you now need to restore a file or a directory, you don't need to load all the CDs to search for it. Dar has 3 features to help with this.

    1. The first feature is that whenever you attempt to restore a file, and dar cannot find the archive, it doesn't abort. It asks you to make the file available and hit enter.

      So when you attempt to restore a file from a backup set, like below,

      dar -x /mnt/cdrom/2006-05-04-full -g file1 -g file2

      you have to mount the CD on which the first slice of the May 4 full backup is stored, i.e., the CD on which a file called 2006-05-04-full.1.dar is present.

      When this is done, dar will ask for the CD containing the last slice (you can mark your CDs as 2006-05-04-full, 1/5, then 2/5, etc. so you know which is the last one).

      These two slices (the first and last slice) are always needed by dar for any restore, because of how it stores the metadata internally. If the files you want are entirely contained within one or both of these slices, you're done.

      Otherwise dar will prompt you to load the CD with, say, slice number 3, so you can load that.

    2. Of course, you're not going to make full backups every time. Let's say you did a full backup on the 1st, then differentials every day for the next 20 days. And you don't really know when the file was last changed, so the latest version could be in any of these backup sets.

      To help in this, dar has the concept of "isolating" (I prefer to call it "extracting") the directory data from the actual archive with the -C option. This is, of course, much, much, smaller, and you can keep the metadata for all of your archives on disk, without losing a lot of space.

      You can then query each of the catalogs using -as -l to see which one has the file you want. Let's say you find it in 2006-08-12-diff; you then mount the CD that contains the first slice of that backup set.

      [You are labelling your CDs as soon as you write them, aren't you?]

    3. But even that is too much trouble. Dar comes with another program called dar_manager whose purpose is to eliminate the drudgery in the above step. But this article is already far too long, so I'll just give the commands with a very brief explanation:

      dar_manager -C my_dmd

      creates the dar_manager database. This is done once only.
      dar_manager -B my_dmd -A 2006-07-01-full_catalog
      updates the database with the catalog for the July 1st full backup. You will have to run a similar command for every backup you take, but you would just add that to the script so it's automated.

      dar_manager -B my_dmd -r files-to-restore
      This is where all the magic happens. Just mount the CDs it asks for when it asks for them and hit enter. Dar will always ask for the first and last slices of any multi-slice archive, due to its design, but slices in between will only be asked if they contain the file you want.

    4. One final tip: I haven't tried this yet, but I think you should make the slice size half the CD size (say 350M) and burn the first and last slices onto one CD. This ought to eliminate one extra CD mount!

What next?

Well, the next chapter: Synchronisation -- when files change on both sides