• Automated Backup-Jobs on Mac OSX

    Posted on June 20th, 2010 Nattl 2 comments

    It is always a good idea to backup a system. In my case I was looking for a bulletproof way to backup the mysql-databases on my web server.  In the past I used to run mysql-dump locally on my Mac to backup those databases. When I lived in New Zealand and had only restricted access to the web (due to the restricted network of the University of Otago) this was no option anymore. So I ended up with making the mysql-dump directly on my webserver. A cronjob would control it and periodically backup the databases. This however, was not the most intelligent way to solve the problem as the backup now was done on the same system as the database. So if the system would go down, not just the database would be gone but also all the backups. So I ended up downloading the dumps manually using sftp which was possible as I used Corkscrew to bypass the proxy. I’m pretty sure that I would have found another way to automate the backup on my Mac, but it would have definitely been awkward and complicated.

    So now that I’m back home in Europe and have access to super fast internet I thought about a more convenient way to backup the databases. I found the mysql-dump on the webserver superfast (just a second for a large 100 MB database) – which never was the same when I dumped the database remotely. So I thought about letting the dump-files on the server and just create a job that periodically grabs them and downloads them locally.

    I looked at several ways how to do this, with using ftp/sftp-scripts and expect but none of them worked properly. Then I remembered something I read a couple of months ago about CURL, which is a nice tool to grab data from the web and comes out of the box on the Mac. So now I show you how I gonna do this:

    Assume I have a dump-file (but it could be any other file) on the web server in the directory dumps. The mysql-dump-cronjob creates a file every day and it is named the calendar-date + .sql and then tar.gzipped. So the file for the 20th of June 2010 would be 200610.sql.tar.gz. I know that I can expect a file named following this rules every day after a certain time (the time when the job runs) – in my case it is, let’s say 2 AM in the morning. So every day after  2 AM there will be a file with the calendar date. To download this file, all I have to do is to create a job that connects to my web server and downloads this file.

    I want to download this file to a directory called bak, which is located in my home-directory. It should be named exactly the same like the file on the web server. Here is what to do:


    #username of the ftp-user (should have access to the directory where the dumps are located
    USER='myusername'
    #the password of the ftp-user
    PASSWORD='mysecretpassword'
    #the name of the file will be created on the fly based on the current date
    FILE=`date +%d%m%y`.sql.tar.gz
    #the domain-name of the server from which the dump has to be downloaded
    SERVER='myserver.com/'
    #the remote path to the dump-file
    REMOTE_PATH='/home/username/dumps/'
    #the loacl path where I want the file to be stored
    LOCAL_PATH='/users/username/bak/'
    curl -s -o $LOCAL_PATH$FILE -u $USER:$PASSWORD ftp://$SERVER$REMOTE_PATH$FILE

    I save this file as database_bakup.sh in directory /scripts. Trying out this shell-script on the command line I see that it works (assuming that all the parameters are correct).

    Well, ok. But I don’t want to manually start the job every day. It should run automatically every day without me doing anything about it. On a Linux-box I would say, no problem, I enter the data in the crontab and let cron work it out. However, since Apple deprecated cronjobs a couple of years ago this is no option. The way how to do it the Mac-way is to use launchd. In order to use launchd you have to define a .plist (which is IMO another horrible XML-hell stupidity, but who asks me). As my Macbook Pro is not always turned on, I want to run the job when I have the machine online. So lets see, when would be a good time. I get up at 7 AM in the morning — on 7.30 I’m usually sitting in front of my computer reading the news while drinking a cup of coffee to get awake. That would be a good time. Here is what the .plist looks like:


    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    <key>Label</key>
    <string>com.example.mybak</string>
    <key>UserName</key>
    <string>myUserName</string>
    <key>ProgramArguments</key>
    <array>
    <string>/bin/bash</string>
    <string>/users/myUserName/scripts/bak.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
    <key>Hour</key>
    <integer>7</integer>
    <key>Minute</key>
    <integer>30</integer>
    </dict>
    <key>Debug</key>
    <false/>
    <key>AbandonProcessGroup</key>
    <true/>
    </dict>
    </plist>

    Defining this .plist I have to save it in ~/Library/LaunchAgents. I name it com.example.mybak.plist, but basically you can name it whatever you like (this is just the correct naming convention). All I have to do now is to tell my Mac to load this job and execute it at the given time. To do this I go to the command line and type in:


    $> launchctl load ~/Library/LaunchAgents/com.example.mybak.plist

    If the plist is ok (ie. I have no errors in the XML-structure and all paths are correct), then launchctl should have loaded the needed data and all I have to wait until it is 7.30 AM to see if the job worked. That’s it.

     

    2 responses to “Automated Backup-Jobs on Mac OSX” RSS icon

    • Interessant zu hören, wie du vorgegangen bist. Nachdem ich etwas ganz ähnliches auch noch tun muss, werde ich mir deinen Lösungsansatz als Vorlage nehmen.

      Ich habe aber trotzdem noch eine provokante Frage: Wieso nicht Datenbank dumpen und zippen und dann mit “Time Machine” das Backup des Rechners machen? Damit wären die Änderungen sogar versioniert, oder übersehe ich ein Detail (oder habe ich einen der Gründe überlesen)?

      lg,
      Martin

      • Also die Vorgangsweise ist leicht erklärt: ich habe meinen Webserver schon seit 2001. Damals hatte ich auf meinen Rechnern daheim noch Windows drauf. Die Backup-scripts sind im großen und ganzen aus dieser Zeit. Die derzeitige Lösung ist plattformunabhängig, d.h. sollte ich in den nächsten Jahren auf etwas anderes umsteigen (z.B. wenns irgendwann mal einen ernstzunehmenden Linux-Desktop geben sollte), kann ich das ganze ruck-zuck Federschmuck auf Linux migrieren, ohne auch nur irgendwelche größeren Anpassungsarbeiten zu machen. Aber es hat auch einen anderen Grund, warum ich nicht direkt dumpe sondern den dump am Webhost mache: das ungezippte SQL-Dumpfile hat 87 MB, als tar.gz hat es 25 MB. Dumpe ich jetzt direkt von meinem Laptop daheim, dann übertrage ich erstens die vollen 87 MB und zweitens sind während dieser Zeit die Tabellen, welche gerade kopiert werden, gesperrt. D.h. es macht sich bemerkbar im Frontend. Mag vielleicht keine Auswirkung haben auf nattl.at, aber da rennt noch ein ziemlich aktives Forum auf diesem Server… ;) Und 87 MB übertragen, auch wenns superschnelles ADSL von der Telekoma-Austria ist, dauern halt ein paar Minuten. Ein paar Minuten zu viel. Der Dump am DB-server selber dauert wenige Sekunden. Dann das ganze packen und bequem 25 MB runterladen.

        Aber eigentlich wollte ich ja die Vorzüge von curl herausstreichen. Eines der coolsten Command line Tools die ich kenne. Allerdings lerne ich ja dauernd neue Tools zu benutzen… ;)


    Leave a reply