Creating a local Ubuntu mirror using rsync
by Donald » Thu, 08 Nov 2007 @ 10:53am
The benefit of having a local mirror is that you can install any package without having to wait for long downloads. It is also helpful if you have to regularly maintain or install a lot of Ubuntu machines. This guide will show you how to create and maintain your own local Ubuntu mirror using rsync. Other options for package mirroring are apt-mirror, apt-proxy and debmirror.
Beware that hard disk usage for a Ubuntu mirror which holds only i386 architecture is over 120GB (ubuntu supports i386, amd64, powerpc and sparc architectures) and the initial sync to download all the packages can take days/weeks on a 512k streamyx connection. To figure out the time, simply divide 120GB by your connection speed. For those who live around the Kuching area, you are welcome to come copy my existing mirror if you bring a hard disk along.
1. Install rsync
sudo aptitude install rsync
wget -c http://www.debian.org/mirror/anonftpsync
mv anonftpsync anonftpsync-ubuntu
2, Configure the script. Here's my configuration. The script is well documented.
nano anonftpsync-ubuntu
Code: Select all
#! /bin/sh
set -e
# This script originates from http://www.debian.org/mirror/anonftpsync
# CVS: cvs.debian.org:/cvs/webwml - webwml/english/mirror/anonftpsync
# Version: $Id: anonftpsync,v 1.33 2007/09/12 15:19:03 joy Exp $
# Note: You MUST have rsync 2.6.4 or newer, which is available in sarge
# and all newer Debian releases, or at http://rsync.samba.org/
# Don't forget:
# chmod u+x anonftpsync
# Set the variables below to fit your site. You can then use cron to have
# this script run daily to automatically update your copy of the archive.
# TO is the destination for the base of the Debian mirror directory
# (the dir that holds dists/ and ls-lR).
# (mandatory)
TO=/mnt/mirrorsite/ubuntu
# RSYNC_HOST is the site you have chosen from the mirrors file.
# (http://www.debian.org/mirror/list-full)
# (mandatory)
# (https://wiki.ubuntu.com/Mirrors)
RSYNC_HOST=us.archive.ubuntu.com
# RSYNC_DIR is the directory given in the "Packages over rsync:" line of
# the mirrors file for the site you have chosen to mirror.
# (mandatory)
RSYNC_DIR=ubuntu/
# LOGDIR is the directory where the logs will be written to
# (mandatory)
LOGDIR=/var/log/mirroring
# ARCH_EXCLUDE can be used to exclude a complete architecture from
# mirrorring. Please use as space seperated list.
# Possible values are:
# alpha, amd64, arm, hppa, hurd-i386, i386, ia64, m68k, mipsel, mips, powerpc, s390, sh and sparc
#
# There is one special value: source
# This is not an architecture but will exclude all source code in /pool
#
# eg.
# ARCH_EXCLUDE="alpha arm hppa hurd-i386 ia64 m68k mipsel mips s390 sparc"
#
# With a blank ARCH_EXCLUDE you will mirror all available architectures
# (optional)
ARCH_EXCLUDE="amd64 powerpc sparc"
# EXCLUDE is a list of parameters listing patterns that rsync will exclude, in
# addition to the architectures excluded by ARCH_EXCLUDE.
#
# Use ARCH_EXCLUDE to exclude specific architectures or all sources
#
# --exclude stable, testing, unstable options DON'T remove the packages of
# the given distribution. If you want do so, use debmirror instead.
#
# The following example would exclude mostly everything:
#EXCLUDE="
# --exclude stable/ --exclude testing/ --exclude unstable/
# --exclude source/
# --exclude *.orig.tar.gz --exclude *.diff.gz --exclude *.dsc
# --exclude /contrib/ --exclude /non-free/
# "
# With a blank EXCLUDE you will mirror the entire archive, except the
# architectures excluded by ARCH_EXCLUDE.
# (optional)
EXCLUDE=
#EXCLUDE="
# --exclude *.orig.tar.gz --exclude *.diff.gz
# "
# MAILTO is the address to send logfiles to;
# if it is not defined, no mail will be sent
# (optional)
MAILTO=
# LOCK_TIMEOUT is a timeout in minutes. Defaults to 360 (6 hours).
# This program creates a lock to ensure that only one copy
# of it is mirroring any one archive at any one time.
# Locks held for longer than the timeout are broken, unless
# a running rsync process appears to be connected to $RSYNC_HOST.
LOCK_TIMEOUT=360
# There should be no need to edit anything below this point, unless there
# are problems.
#-----------------------------------------------------------------------------#
# If you are accessing a rsync server/module which is password-protected,
# uncomment the following lines (and edit the other file).
# . ftpsync.conf
# export RSYNC_PASSWORD
# RSYNC_HOST=$RSYNC_USER@$RSYNC_HOST
#-----------------------------------------------------------------------------#
# Check for some environment variables
if [ -z $TO ] || [ -z $RSYNC_HOST ] || [ -z $RSYNC_DIR ] || [ -z $LOGDIR ]; then
echo "One of the following variables seems to be empty:"
echo "TO, RSYNC_HOST, RSYNC_DIR or LOGDIR"
exit 2
fi
if ! [ -d ${TO}/project/trace/ ]; then
# we are running mirror script for the first time
umask 002
mkdir -p ${TO}/project/trace
fi
# Note: on some non-Debian systems, hostname doesn't accept -f option.
# If that's the case on your system, make sure hostname prints the full
# hostname, and remove the -f option. If there's no hostname command,
# explicitly replace `hostname -f` with the hostname.
HOSTNAME=`hostname -f`
# The hostname must match the "Site" field written in the list of mirrors.
# If hostname doesn't returns the correct value, fill and uncomment below
# HOSTNAME=mirror.domain.tld
LOCK="${TO}/Archive-Update-in-Progress-${HOSTNAME}"
# The temp directory used by rsync --delay-updates is not
# world-readable remotely. It must be excluded to avoid errors.
TMP_EXCLUDE="--exclude .~tmp~/"
# Exclude architectures defined in $ARCH_EXCLUDE
for ARCH in $ARCH_EXCLUDE; do
EXCLUDE=$EXCLUDE"
--exclude binary-$ARCH/
--exclude disks-$ARCH/
--exclude installer-$ARCH/
--exclude Contents-$ARCH.gz
--exclude Contents-$ARCH.diff/
--exclude arch-$ARCH.files
--exclude arch-$ARCH.list.gz
--exclude *_$ARCH.deb
--exclude *_$ARCH.udeb "
if [ "$ARCH" == "source" ]; then
SOURCE_EXCLUDE="
--exclude source/
--exclude *.tar.gz
--exclude *.diff.gz
--exclude *.dsc "
fi
done
# Logfile
LOGFILE=$LOGDIR/ubuntu-mirror.log
# Get in the right directory and set the umask to be group writable
#
cd $HOME
umask 002
# Check to see if another sync is in progress
if [ -f "$LOCK" ]; then
# Note: this requires the findutils find; for other finds, adjust as necessary
if [ "`find $LOCK -maxdepth 1 -amin -$LOCK_TIMEOUT`" = "" ]; then
# Note: this requires the procps ps; for other ps', adjust as necessary
if ps ax | grep '[r]'sync | grep -q $RSYNC_HOST; then
echo "stale lock found, but a rsync is still running, aiee!"
exit 1
else
echo "stale lock found (not accessed in the last $LOCK_TIMEOUT minutes), forcing update!"
rm -f $LOCK
fi
else
echo "current lock file exists, unable to start rsync!"
exit 1
fi
fi
touch $LOCK
# Note: on some non-Debian systems, trap doesn't accept "exit" as signal
# specification. If that's the case on your system, try using "0".
trap "rm -f $LOCK" exit
set +e
# First sync /pool
rsync --recursive --links --hard-links --times --verbose
$TMP_EXCLUDE $EXCLUDE $SOURCE_EXCLUDE
$RSYNC_HOST::$RSYNC_DIR/pool/ $TO/pool/ >> $LOGFILE 2>&1
result=$?
if [ 0 = $result ]; then
# Now sync the remaining stuff
rsync --recursive --links --hard-links --times --verbose --delay-updates --delete-after
--exclude "Archive-Update-in-Progress-${HOSTNAME}"
--exclude "project/trace/${HOSTNAME}"
$TMP_EXCLUDE $EXCLUDE $SOURCE_EXCLUDE
$RSYNC_HOST::$RSYNC_DIR $TO >> $LOGFILE 2>&1
LANG=C date -u > "${TO}/project/trace/${HOSTNAME}"
else
echo "ERROR: Help, something weird happened" | tee -a $LOGFILE
echo "mirroring /pool exited with exitcode" $result | tee -a $LOGFILE
fi
if ! [ -z $MAILTO ]; then
mail -s "debian archive synced" $MAILTO < $LOGFILE
fi
savelog $LOGFILE >/dev/null
rm $LOCK
Do not modify the rest of the file. Save and quit.
3. Make the script executable
chmod u+x anonftpsync
4. Create the necessary directories
sudo mkdir /mnt/mirrorsite
sudo mkdir /mnt/mirrorsite/ubuntu
sudo mkdir /var/log/mirroring
sudo chown my-username my-username /mnt/mirrorsite/ubuntu
sudo chown my-username my-username /var/log/mirroring
5. Run the script and wait a long long time.
sh anonftpsync-ubuntu &
you can monitor progress of downloads by running
tail -f /var/log/mirroring/ubuntu-mirror.log
6. Using the mirror.
My mirror is served using http access. Install apache and create a link to /mnt/mirrorsite/ubuntu so that the mirror can be accessed using http://servername/ubuntu.
In order to avoid having to edit the typical lines in sources.list, I created a DNS entry on my server to point my.archive.ubuntu.com to my local IP. This is particularly useful for laptop users who move around so that you can update from the my.archive once outside the local network. The only lines needed to be changed in the sources.list are the security repo. For those who are extremely security conscious, you might skip this. Here's part of the sources.list
Code: Select all
# deb http://security.ubuntu.com/ubuntu gutsy-security main restricted
# deb-src http://security.ubuntu.com/ubuntu gutsy-security main restricted
# deb http://security.ubuntu.com/ubuntu gutsy-security universe
# deb-src http://security.ubuntu.com/ubuntu gutsy-security universe
# deb http://security.ubuntu.com/ubuntu gutsy-security multiverse
# deb-src http://security.ubuntu.com/ubuntu gutsy-security multiverse
deb http://my.archive.ubuntu.com/ubuntu gutsy-security main restricted
deb-src http://my.archive.ubuntu.com/ubuntu gutsy-security main restricted
deb http://my.archive.ubuntu.com/ubuntu gutsy-security universe
deb-src http://my.archive.ubuntu.com/ubuntu gutsy-security universe
deb http://my.archive.ubuntu.com/ubuntu gutsy-security multiverse
deb-src http://my.archive.ubuntu.com/ubuntu gutsy-security multiverse
7. Schedule daily updates of the mirror
crontab -e
Code: Select all
# m h dom mon dow command
05 04 * * * /full/path/to/anonftpsync-ubuntu
This will run the rsync script every day at 4.05am
Comments welcome