Tutorial: Backups with Launchd

The other day at work someone asked me if there was some way to have OS X run an rsync command to an external drive whenever it was plugged in. Well, given that we were talking about Mac OS 10.4, it was easy to answer. Of course you can do that.

Why would anyone want to do that? Well, when he plugged in the external drive, he wanted it to immediately start backing up his data to the disk, instead of having to type a command or run a script manually. No problem my friend, OS X can accommodate you!

New in 10.4 is a system daemon called launchd. Launchd is Apple's replacement for a number of *NIX daemons that are typically used for launching system services at boot time or on demand after system launch. Launchd, although a work in progress, is extremely powerful. Process ID 1 in the system is in fact launchd. It's always running, and always watching.

Launchd gets its configuration information for an agent or daemon from a Property List file (plist). Examples of plist files used by launchd for the system are located in:

/System/Library/LaunchDaemons (admin level system daemons)
/System/Library/LaunchAgents (admin level user agents)

At the user level, you can run launchd processes in user space in a number of ways. You can use launchctl (man launchctl) from the command line. Or you can create your own plist file and place it in a special location for launchd to use when you log in by creating the equivalent "Launch" directories in ~/Library (the /System/Library folders are typically where system admins place global configuration files). Alternatively, you can add the command to a $HOME/.launchd.conf file that you can create and modify (again, the launchctl man page has more information).

The plist file contains information that launchd is going to use to figure out exactly what it's supposed to do. It could perform a system task or run a custom script.

Ok, enough blabbing, let me illustrate with an example geared toward the request from my co-worker. It's easier to understand that way. The example assumes you have a firewire/usb external drive to attach to your system.

Basic Setup

1) In terminal cd to ~/Library
2) If you don't have a LaunchAgents directory create one:

	mkdir ~/Library/LaunchAgents

3) While you are at it create a folder called Scripts

	
	mkdir ~/Library/Scripts

Remember, at login, launchd will scan the contents of the ~/Library/LaunchAgents folder for plist files to process. Once you put one in there launchd will take over for you everytime you log in.

Property List

1) Launch Terminal.app and in the terminal cd into ~/Library/LaunchAgents and issue the following commands:

  touch com.macresearch.backup
  open -e com.macresearch.backup

2) With the new file open in TextEdit add the following content to it:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" \
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<dict>
        <key>Label</key>
        <string>com.macresearch.backup</string>
        <key>LowPriorityIO</key>
        <true/>
        <key>Program</key>
        <string>/Users/gohara/Library/Scripts/backup.com</string>
        <key>ProgramArguments</key>
        <array>
                <string>backup.com</string>
        </array>
        <key>WatchPaths</key>
        <array>
        <string>/Volumes</string>
        </array>
</dict>
</plist>


Let's go over this as a lot of important stuff is here. All of the important information is between the dictionary statements (<dict></dict>).

<key>Label</key>
<string>com.macresearch.backup</string>

This is a unique identifier that launchd will use when it loads up the plist file (once launchd loads a configuration file you can issue "launchctl list" at the command line to see what tasks it is monitoring. This is the string it will report). Make this string meaningful, as it's the quickest way to tell what a launchd command is designed to do.

<key>LowPriorityIO</key>
<true/>

Since we are doing file IO, and we may need to use the computer for something more important like...playing online Poker, we want to minimize the system resources diverted to the backup. This is entirely optional.

<key>Program</key>
<string>/Users/gohara/Library/Scripts/backup.com</string>

This tells launchd what program we want it to....well launch.

<key>ProgramArguments</key>
<array>
        <string>backup.com</string>
</array>

The program arguments are important. The first argument listed is always the program itself. If you want to pass in additional arguments, you simply add more <string></string> statements between the array delimiters.

<key>WatchPaths</key>
<array>
<string>/Volumes</string>
</array>

Finally we tell launchd what we want to use as a trigger for launching the script. In this example we are telling it to watch the path /Volumes. Why? Well anytime we mount a device on the file system a link is placed in /Volumes. From this point on, launchd knows to watch /Volumes for ANY changes. If it detects a change it will then launch the backup script (our "program"). Again, you can add multiple paths for it to watch by adding path strings between the array delimiters. You can check the man pages for launchd for more options (man launchd).

The important thing to remember here is that launchd will execute the script regardless of what is added or removed from the /Volumes path. This includes CD/DVD's, USB devices, disk images, or even if you create a folder in /Volumes. Launchd is powerful, but it's stupid (for now). So we need to build some smarts into our program (or script in this case) to make sure the script does the right thing.

Script

I'm going to create a tcsh script for this example. If you are more comfortable
with bash (or even AppleScript), you can convert this example to those forms as well.

1) In Terminal cd ~/Library/Scripts

  touch backup.com
  chmod 755 backup.com
  open -e backup.com

2) Copy the following into the document:

#!/bin/tcsh

# Convenience variables to specify what I want to
# backup and where I want to back it up to

set folderToBackup = "/Users/Shared/Expenses"
set backupVolume = "/Volumes/BACKUP"
set backupTo = "${backupVolume}/backup"

# This sleep timer has been added to allow enough
# time for the system to mount the external drive
# On my PowerBook 30 sec. is more than enough time

sleep 30

# This check is added to test for cases when we are
# removing a drive from /Volumes or if the drive failed
# to mount in the first place

if (! -e $backupVolume ) then
 exit 0
endif

# Create the folder to back up the data to (defined above)

if (! -e $backupTo) then
 mkdir -p $backupTo
endif

# Copy the files over. Here we are using rsync.

rsync -aq --delete $folderToBackup $backupTo

# Optionally, once the rsync is done, unmount the drive.

#hdiutil detach $backupVolume

exit 0

Ok. let's go through this.

set folderToBackup = "/Users/Shared/Expenses"
set backupVolume = "/Volumes/BACKUP"
set backupTo = "${backupVolume}/backup"

These are convenience variables defining what I want to backup (the Expenses folder in /Users/Shared). The volume name (in this case my firewire drive has a volume called BACKUP). And the location on the backup drive I want to backup the Expenses folder to (in this case in a folder called backup). Obviously if your drive is named something else (and the folder to backup is as well), you'll need to change these lines.

sleep 30

On my PowerBook it takes about 10 seconds from the time I plug in the device and the device is mounted in /Volumes to the time the device is ready to accept modifications (that is, for the device to be capable of being written to). Put another way, launchd won't launch the script until it sees the device appear in /Volumes. However, it can still take a few seconds before anything can be written to the drive. So this sleep is just a buffer to ensure the device is ready.

if (! -e $backupVolume ) then
 exit 0
endif

This set of instructions is designed to make sure that we don't try and write to the device during the unmounting stage. Remember launchd will execute this script ANY time a change is made in /Volumes. When we "eject" the disk, launchd will run again. This test is designed to make sure that if we have ejected the volume, rsync won't copy files directly in the /Volumes directory.

if (! -e $backupTo) then
 mkdir -p $backupTo
endif

rsync should create the directory structure for us in general, but it's not bad to make sure it's already in place. And if it isn't, make it so.

rsync -aq --delete $folderToBackup $backupTo

Finally let's do the backup.

One optional step is to unmount the disk when process is complete. To do this, you could add the following line:

hdiutil detach $backupVolume

Register the Script with Launchd

There are two ways to register the script with launchd. From the command line or by simply logging out and then back in. To save some effort let's register it from the command line:

  launchctl load ~/Library/LaunchAgents

Now issue the command:

  launchctl list

You should see something like this:

[Voyager:~/Library/Scripts] gohara% launchctl list
com.macresearch.backup

Ok. Launchd is aware and ready to go.

Plug in the Drive

Once you plug in the drive (and wait ~30 seconds) what you should notice is that the folder (and its contents) you designated to be backed up will begin appearing on the drive at the specified path. Pretty cool huh?

Once the process is complete you can safely eject the disk.

Afterthoughts

The WatchPath directive is very powerful. Imagine you have a folder that you want occasionally dump files into. Maybe those files are data that is being generated by some other program. You can specify to launchd to watch that folder, and whenever data appears there (or any modifications made really) launchd can run a command/script/program to do something with that data. For example, you could have launchd run a script that will convert the data, pass it into a plotting program, generate plots, and then email the plots to you or a colleague. Pretty cool stuff!

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Some feedback...

First of all - GREAT tutorial. switched this year, and have been working to port my working Linux/Solaris automation experience to make my MacBook Pro more of an extension of myself ever since.. This Tutorial filled a few holes which is nice!!

I have played with your setup a little, and have noticed a few things:
1- As you mentioned, the script is run with every disk mount/unmount, including ipods and dmg files. If your "backup" drive also has a data partition on it like mine, many un-necessary rsync's every time you pop a CD in the drive etc... I think the script needs a lock file on the removable to allow for a "minimum times between backups" setting.

2- Not a big TCSH person, so I modified the script and "bash"ed it. will paste the script below

3- Multiple source dirs. In the script referenced in (2) I have added the capacity to have more than 1 dir backed up ( I need ~/ and /Applications backed up)

4- I was getting bombed with messages like:
"Dec 30 00:18:32 laptop launchd[321]: com.laptop.backup: 6 more failures without living at least 60 seconds will cause job removal"

I have changed the sleep time to 60 seconds, to see if this makes the script a little more launchd friendly. Not sure if this is a solution though.

5- The next thing I need to work on is the network version of this, that will kick of an SSH based rsync when it detects my home or work networks... I'll let you know when I figure it out...

6- No logging makes troubleshooting hard. I have added in some system log output for the time being. Once I make the backup script compatible with my shell scripting functions library, I'll add debugging flags to change the verbosity level.

anyway, bach script follows:

#-- begin
#!/bin/bash

folderToBackup="/Users/homedir /Applications"
backupVolume="/Volumes/MyDrive"
backupTo="${backupVolume}/backup"
sleepFor=60

echo -n "[*]-- Autobackup envoked at `date`" | logger
echo -n "[*]-- Sleeping for ${sleepFor} seconds..." | logger
sleep ${sleepFor}

if [ ! -e ${backupVolume} ]
then
echo -n "[*]-- BackupVolume NOT connected - Exiting" | logger
exit 0
else
echo -n "[*]-- BackupVolume Connected - Continuing" | logger
fi

# Create the folder to back up the data to (defined above)

if [ ! -e ${backupTo} ]
then
echo -n "[*]-- Backupdir does not exist... Creating...)" | logger
mkdir -p ${backupTo}
else
echo -n "[*]-- Backupdir exists! - Continuing" | logger
fi

# Copy the files over. Here we are using rsync.
for i in ${folderToBackup}
do
echo -n "[*]-- Starting Rsync of ${i} to ${backupTo}" | logger
rsync -aqE --delete ${i} ${backupTo}
echo -n "[*]-- rsync of ${i} to ${backupTo} complete..."| logger
done
#rsync -aqE --delete $folderToBackup $backupTo

# Optionally, once the rsync is done, unmount the drive.

#hdiutil detach $backupVolume

exit 0
#--end

New Version

I have fixed items 1, 2, 3 4 and 6 from the above list. Download available at http://www.users.on.net/~aidanandjen/BlurredVisions/files/edc01be85bcd4f5ae1789fd9c290a20e-2.html

rsync detect file changes?

Great post, the tutorial is very good, and the first comment had excellent ideas for the script.

My experience with rsync has been annoying; ideally, it would only copy files that have changed. When I test it, rsync always copies some files. It looks like most or all of these files are related to the Mac resource forks. rsync lists the files as "._FILENAME". These files might be small individually, but for a typical home directory, it can still take 30 minutes or more. It takes the fun out of a backup when only a few files have changed, but rsync is still copying 1000's of files.

Does anyone know what is going on with rsync and if it can be "fixed"?

Git-based versioned backup/history

Great stuff! I used the technique from this article to create a job that backs up a file (viz., ~/Library/Application\ Support/Camino/WindowState.plist, the file Camino uses to store its window/tab state) using Git so that I could restore any specific version from the history. It's like Time Machine for my one file that I actually care about having full history for, without needing an external drive or upgrading to Leopard.

Thanks so much for working out this technique! The two-line shell script (with no error handling or logging, and with the full path to Git particular to my machine) that the launchd job runs is:


#!/bin/bash

# Assumes that the WindowState.plist file has already been added to the
# Git repository in the following directory at least once.

cd /Users/me/Library/Application\ Support/Camino/

/opt/local/bin/git commit -a -m "u"

Thanks

Thanks, it have helped.

Backup when Specific volume is mounted


# This check is added to test for cases when we are
# removing a drive from /Volumes or if the drive failed
# to mount in the first place
if (! -e $backupVolume ) then
exit 0
endif

Wouldn't this also check for the specific volume name when a volume is attached before running the backup? Sure every time launchd detects a change to /Volumes the script would run, but if the volume name does not match that specified by $backupVolume then the script would exit. Is this assumption correct?
I'm looking for a way to only have the backup execute when a volume of a specific name is attached. I don't really care if it executes again if I attach a different device while the backup device is plugged in already.

Thanks

P.S. I am extremely new to launchd so forgive my ignorance if I show any.

Continuous mirroring?

Greetings. Thanks for all the helpful information. From what I see here, it looks like launchd can be used to trigger rsync for scheduled incremental backups, but what if I want to mirror two drives in real-time? The use case is this: I have two iSCSI drives. Let's call one Primary and one Secondary. Primary is used to host home folders and other share points for OS X Server. Secondary is intended to be a warm-standby backup, to be used in case Primary fails or needs to be taken off line. I'd like to be able to switch over to Secondary with little or no data loss. Also note that both Primary and Secondary are configured with several volume partitions - all of which need to be mirrored in real time. I can have all volumes from both Primary and Secondary mounted to one machine, so I *think* I could use rsync to just backup each volume locally - no ssh required. I also believe I could use launchd to watch each volume for any changes, but wouldn't that cause rsync to run any time *any* change is made? That seems dangerous to me. Maybe what I'm really looking to do is set up the two iSCSI drives in some sort of RAID configuration?

Thoughts anyone?

New information

New information:

I just tried using Disk Utility to set up two companion partitions on the two iSCSI drives as a single mirrored RAID partition. This works fine as long as the iSCSI drives are only mounted to one server. However, when mounted to both of the servers in my array, changes made to the directory by server A are not recognized by server B and vice versa. Maybe a third party software RAID solution would work better? Or perhaps I need to run something on the MaxNas itself?

Re: RAID

It sounds to me like you really just want a mirrored RAID. You can do this in software using Disk Utility. That's probably the best bet for what you are looking to do. Then you don't even have to worry about syncing. The OS will write to both disks at the same time. If one disk fails you simply replace it (the other disk will take over the work). Once the failed disk is replaced, the raid will rebuild on it.

Hope that helps,

Dave

It is certainly possible to resolve this issue

Does anyone know what is going on with rsync and if it can be "fixed"?

The problem may be that you do not have the permissions on the destination volume enabled. Visit the following URL for details on how to enable permissions on your volume : http://www.lbackup.org/permissions

FYI : LBackup automatically checks that the permissions on the Backup destination volume are enabled when running on Darwin ( Mac OS X ) systems before commencing with the backup.