If backing up data is not already a part of your (daily/weekly/hourly) routine, it should be.
Previously, I have talked about creating a backup process, and how double/triple/off-site backups can be lifesavers.
What I’m going to talk about today is how you can make backups, specifically from the Linux command line.
If you’re still backing up by copying and pasting files around in the GUI, or even by using tar
, cp
, dump
or restore
from the command line, you’re missing out.
There’s a better way to quickly and efficiently backup your data. It’s called data synchronization — and there’s a command line utility that does it for you.
Data Syncing 101
To synchronize is to cause something to occur/operate/move/etc. at the same time or rate as something else, in coordination.
To synchronize data is to establish a consistency of that data between two points.
If these two definitions do not at first appear to be similar, then hopefully their similarity will become clear as you consider the following example.
Let’s say that you have Point A (your hard drive) and Point B (an external hard drive that you use for backing up).
Over the course of each day as you industriously get things done, various files/folders in your file system will be created, deleted, and/or updated.
It would be a never-ending job in itself to try and keep track of which files and folders are created/deleted/updated over the course of each day, so instead, when it comes time to backup, you delete your entire backup (practically wipe out Point B) and replace it with the contents of Point A.
If you’re doing this, you’re doing it the hard way. It’s slow, not to mention completely unnecessary.
The alternative is to sync your data. The tricky part is to make sure that the changes made to Point A are duplicated on Point B — and not the other way around!
This can be done with a single utility that compares the data in two specified locations, and updates one of those locations to match the other. It coordinates them.
This particular type of synchronization is referred to as “mirroring”, as it is a one-way transfer of data as opposed to two-way synchronization, which to my knowledge doesn’t have a catchy short-name.
This utility that I recommend to synchronously backup or “mirror” data, is called rsync.
Using the rsync Utility
The rsync utility is “a fast, versatile, remote (and local) file-copying tool”. It stands for “remote sync”.
The syntax is rsync source destination, or, as I like to call it, rsync what where.
Rsync is only as good as the myriad of arguments that can be used, to make it perform to your unique specifications. Some of my favorites are:
-a or –archive will use the archive mode, which acts as 7 other options (-rlptgoD) put together. It’s great for performing backups.
–delete will delete files at the destination that no longer exist at the source.
–delay-updates will wait until the end of the process to put all updated files on the destination.
–exclude=PATTERN will exclude patterns that are specified in place of PATTERN. (This is a good time to pull out some wildcards!)
-g or –group will preserve the group permissions.
-h will show numbers (file sizes, etc.) in a human-readable format.
–list-only will only list files instead of actually copying them.
–max-size=SIZE allows you to specify a maximum file size; any file larger than that size will not be transferred.
–min-size=SIZE allows you to specify a minimum file size; any file smaller than that size will not be transferred.
-o or –owner will preserve the owner permissions (super-user only).
-r or –recursive will travel recursively into directories.
-u or –update will not update files on the destination if those timestamps are newer than the source.
-v or –verbose will increase verbosity, or explain what’s going on in more detail.
The rsync command that I use to backup to my external hard drive looks something like this:
rsync -av --delete --exclude=.* /home/gwen /mnt/hd
As you can see, I very simply back up my entire home directory to a mounted external hard drive.
As far as arguments/options go, I have chosen to archive, to see all the verbose details, to delete files on the destination that no longer exist on the source, and to exclude all files that begin with a period.
Using the rsync Utility Remotely
Since the name of the utility is remote sync after all, let’s quickly touch on how to use it remotely.
At any given time, either the source or the destination — or both, but that shouldn’t be necessary! — of the rsync utility can point to a remote (over-the-network-or-internet) location.
Simply replace either the source or the destination with the following pattern: user@domain.com:/directory/
Example: rsync -av --delete --exclude=.* /home/gwen gwen@takingnothingforgranted.com:/home/gwen/backup/
Summary
So, to use my favorite expression: explore!
The worst that could happen is that your backup could fail (and you don’t realize it at the time), your hard drive could die (in the middle of a very busy day), and you could go to your backup and find all of your data non-existent…
But you recently mailed an extra (successful) offsite backup copy to your parents, so they can save the day, right?