I manage lots of domains, and offer my clients free backup and recovery service. Nice selling perk, but I best be damn sure I am backing things up regularly. Since there is no way my space-cadet brain would remember that, I rely on my ‘nix friends: bash, cron and tar to neatly package every sub-directory of my webroot into their own little tarballs. The bash script included after the break reads all directories into an array that we can loop through and manipulate as needed**.**
My webroot is pretty straightforward. Each subdirectory is the full URL of a site. Example:
webroot |– baungenjar.com |– bythebootstrap.com |– cafe–espresso.com |– crm.webbmaster.org |– eaw-technologies.com |– edwardawebb.com |– entrepreneurmeetcapitalist.com |– kicksareforribs.com |– ollitech.com |– outlaw-photography.com |– piwik.baungenjar.com |– rendinaro.com |– riversidegrillnh.com |– sidi.webbmaster.org |– status.webbmaster.org |– steadfastsites.com |– stonybrookpottery.com |– survey.webbmaster.org |– top.webbmaster.org |– trippymedia.com |– webbmaster.org `– wishlist.webbmaster.org
So I want to take each directory and create a gzipped tarball using tar -zcf. I don’t want to do any manual intervention. *ALso, I want to exclude some directories.
This actually took me much longer than it should have. Too many articles insisted you should change the internal field separator (IFS) to a newline character and just use ls. Now in my training I was learn-ed that messing with IFS can be dangerous, and will disrupt many things, including any command that relies on options or arguments (i.e. ALL) So rather than make my entire script confirm to pesky newlines returned by ls, I did one simpler - eliminate the newlines in place of a decent token separator, like a space! To exclude directories I just use a magic file “.DONT_BACKUP” that the script checks for.
#!/bin/bash
DEFRETAIN=14 LOGFILE=/home/webb_e/site_backups/WebrootBackup.log
BU_FILE_COUNT=0
WEBDIR=webroot
DESDIR=site_backups
#alright, thats it for config, the rest is script #########################################
cd ${HOME}/${WEBDIR}/
TODAY=date
BU_FILE_COUNT=0
suffix=$(date +%m-%d-%Y)
printf “\n\n********************************************\n\tSite Backup r Log for:\n\t” | tee -a $LOGFILE
echo $TODAY | tee -a $LOGFILE
printf “********************************************\n” $TODAY | tee -a $LOGFILE
echo “see ${LOGFILE} for details”
#for DIR in $(ls | grep ^[a-z.]*$)
for DIR in $(ls | grep ^[a-z.]*$) do echo $DIR #tar the current directory if [ -f $DIR/.DONT_BACKUP ] then
printf "\tSKIPPING $DIR as it contains ignore file\n" | tee -a $LOGFILE
else
cpath=${HOME}/${DESDIR}/${DIR}
#
#check if we need to make path
#
if [ -d $cpath ]
then
# direcotry exists, we're good to continue
filler="umin"
else
echo Creating $cpath
mkdir -p $cpath
echo $DEF_RETAIN > $cpath/.RETAIN_RULE
fi
#
tar -zcf ${HOME}/${DESDIR}/${DIR}/${DIR}_$suffix.tar.gz ./$DIR
BU_FILE_COUNT=$(( $BU_FILE_COUNT + 1 ))
fi
done printf “\n\n********************************************\n” | tee -a $LOGFILE echo $BU_FILE_COUNT sites were backed up printf “********************************************\n” $TODAY | tee -a $LOGFILE
$ ~/scripts/file_backups/SiteBackup.sh see /home/myself/site_backups/WebrootBackup.log for details
Site Backup r Log for:
Thu Apr 15 14:31:53 PDT 2010
albums.stonybrookpottery.com baungenjar.com bythebootstrap.com crm.webbmaster.org digbiz.webroot.com dotproject.webbmaster.org edwardawebb.com kicksareforribs.com mantis.webbmaster.org SKIPPING mantis.webbmaster.org as it contains ignore file ollitech.com openx.webbmaster.org piwik.baungenjar.com rendinaro.com riversidegrillnh.com sidi.webbmaster.org status.webbmaster.org steadfastsites.com stonybrookpottery.com survey.webbmaster.org svn.webbmaster.org taskfreak.webbmaster.org test.kicksareforribs.com texpat.webbmaster.org top.webbmaster.org Creating /home/myself/site_backups/top.webbmaster.org trac.webbmaster.org trippymedia.com wave.webbmaster.org webbmaster.org wishlist.webbmaster.org
33 sites were backed up
But wait! We wanted to automate this whole thing right? And so we shall.
If your using a webhost they likely provide a GUI to add cron jobs. If that’s the case you can just point to the full path where you saved the above script, select the interval and your good to go. If your using this on your own server you’ll need to get your hands dirty with a crontab. You can open the crontab file in your editor of choice, or call it from the command line. IN this example we’ll rely in vi, my systems default editor. Create a crontab file if it does not already exist and open it for edit
crontab -e
You may see some existing lines or you may not. Just remember one job per line. THe layout may seem overwhelming at first, but its quite simple, and breaks down like this
min hour day month weekday job_to_Run
The values are in the respective ranges for day of week 0 is Sunday.
0-59 0-23 1-31 1-12 0-6 filename
To omit a field replace it with an asterisk (*) which means all values. Alternately you may use comma separated lists. Although I believe it will treat any whitespace as a delimiter I use tabs to make the organization a little nicer. So let’s suppose I want to run this job nightly, it is after all named DAILY sql backup :) I will add the following line to my crontab
15 0 * * * $HOME/scripts/file_backup.sh > logfile.log
This means every day @ 00:15 a.k.a 15 minutes past midnight it will run the script and print any output into the specified logfile. If you leave off the redirect to logfile it will email the user with the results. To omit any output use the handy standby
/dev/null 2>&1
.
If you are curious about the script that will actually recycle old backups, then I suggest Log Recycler Script which could easily be updated to handle .tar.gz files instead of .sql.gz files :) If you want to backup your MySQL databases, I have that too, Shell script to backup multiple databases