Skip to content

mjuric/cya

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cya -- a multi-level backup driver for Duplicity
================================================

Mario Juric <mjuric@lsst.org>

cya (pronnounced see-ya) is a driver for the Duplicity backup tool
(http://duplicity.nongnu.org) that enables:

* multi-level incremental backups on varying time-scales (decade, year,
  month, week), with the ability to keep only the latest few backups on each
  time scale.

* handling of hard links, access control lists, and extended attributes. This
  allows whole-system backups to be performed with cya.

* unattended backups while maintaining security and space efficiency.


Quick Start
===========

How it works
------------

Every day each client adds an incremental backup to a client-specific
'incoming' directory on the archive server (typically,
~archive/$CLIENT/incoming). The backups are usually encrypted for added
security.

The archive server then moves the new increment to its correct place in the
multi-level hierarchy (see 'Multi-level Backups' below), and prepares the
'incoming' directory for the next (incremental) backup.

In time, at each level in the directory tree (typically,
~archive/$CLIENT/backups) a backup chain forms, with increments taken at
different scales.


Setting up unattended backups
-----------------------------

Preparing the archive server:

	* install cya in /opt/cya
	* install duplicity 0.6.20 or higher

	* create a user named 'archive'

	* copy templates/collect.sample to /etc/cya/collect. Customize if
	  necessary.

	* add /etc/cya/collect to crontab to be ran daily


Preparing a client to be backed up:

	* install cya to /opt/cya
	* install duplicity 0.6.20 or higher

	* create /etc/cya directory and chown it to 700
	* copy templates/backup.sample to /etc/cya/backup and edit it.
	* set the DESTHOST, DESTDIR, and SOURCEDIR variables. DESTHOST is the
	  address of the archive server to be used by SSH to log into it.
	  DESTDIR is the subdirectory within ~archive, where the backup sets
	  will be stored. By convention, this is usually the fully qualified
	  domain name of the client. SOURCEDIR is the directory on the
	  client you want to back up (usually '/').
	* set BACKUP_ENCRYPTION_KEY to a password unique to this client.
	  This password will be used to encrypt the backup set, and should
	  not be shared with anyone (including the archive server!). Keep it
	  in a safe place -- you will need it to restore the system from
	  backups.
	  Tip: A good way to generate a hard-to-guess password is by running
	       `openssl rand -base64 32`
	* create ssh keys in /etc/cya/keys (using ssh-keygen). Add the
	  public key to ~archive/.ssh/authorized_keys on the archive server.
	* add the archive server's host key to ~root/.ssh/known_hosts, if it
	  isn't there already.
	  Tip: just running 'ssh archive.server.addr' and answering 'yes'
	  when prompted should do it.

	* As root, run 'cya-collect --init ~archive/$DESTDIR' on the archive
	  server to initialize the backup set directory structure.  DESTDIR
	  must be the same as the one you set in /etc/cya/backup file on the
	  client.

	* add /etc/cya/backup to crontab on the client. For optimal
	  performance, have it run ~10-15 minutes after cya-collect runs on
	  the server.
	  WARNING: Some distros (notably, RHEL) set HOME=/ in /etc/crontab.
	  If your distro does this, make sure you run /etc/cya/backup with
	  HOME=/root envvar set. Otherwise, cache directories will be
	  created in /.


Multi-level backups
===================

cya enables multi-level incremental Duplicity backups on varying time-scales
(decade, year, month, week), by organizing Duplicity backups in a directory
tree, where each level in the tree corresponds to the timescale of a backup
stored at that level. Increments that are shared between different levels
are hardlinked to the correct places in the hierarchy (thus the space
efficiency).

For example, a backup made on 2012-12-04 would be placed in a subdirectory
of:

	2000/2010/2012/2012-12/2012-12-02

where the levels correspond to century, decade, year, month and week,
each holding backups for decades, years, months, weeks and days,
respectively.

If the 2012-12-04 backup was the first one ever made, the files in the leaf
directory 2000/2010/2012/2012-12/2012-12-02 would also get hardlinked to
2000/2010/2012/2012-12 (because it's the first weekly backup), and to
2000/2010/2012 (because it's the first montly backup), and so on.

When a backup is made on the next day, 2012-12-05, that day is still in the
week of 2012-12-02, so an incremental backup would be made in
2000/2010/2012/2012-12/2012-12-02. Equally so for all days through
2012-12-08.

On 2012-12-09, the destination directory will change to:

	2000/2010/2012/2012-12/2012-12-09

As it is empty, cya will look into one directory up the hierarchy to use as
a basis for this (incremental) backup.  It will first hardlink files from
2000/2010/2012/2012-12 to 2000/2010/2012/2012-12/2012-12-09, and make this
the 'incoming' directory for the next incremental backup.  The procedure is
repeated when the month/year/decade boundaries are crossed. In time, at each
level in the directory tree a backup chain forms, with increments taken at
different scales.

It's likely desirable to keep this tree pruned, deleting all but two newest
leaf directories at every level of the hierarchy. Right now, cya won't do it
for you (it has to be done manually).


Security model
==============

Definitions:

* archive: the host which holds the backups
* client: a host being backed up
* $ROOT: base directory of backup sets on archive
* Committed backup files: files residing in $ROOT/backups
* Uncommited backup files: files residing in $ROOT/incoming/finished


Design summary:

Client host attempts to back itself up daily to archive:$ROOT/incoming/next
directory, if that directory exists. When it succeeds, it renames that
directory to 'finished'.

The archive host periodically checks for existence of
archive:$ROOT/incoming/finished. If it exists, the increment is moved to the
correct level in the backup hierarchy, and a new $ROOT/incoming/next
directory is prepared.


Design consequences:

* If archive@archive is breached, no committed backup files can be accessed.
  Uncommitted backup files can be accessed, read, deleted, but not decrypted.

* If root@archive is breached all backup files can be accessed, read,
  deleted, but not decrypted.

* If root@client is breached, no committed backups on the archive can be
  accessed.  Uncommitted backup files can be accessed, read, deleted, and
  decrypted.

* If both root@archive and root@client are breached, the backups can be
  read, deleted and decrypted (full compromise).


Implementation details
======================

There are three parts to cya:

* A wrapper for duplicity, `duplicity-ex', that records in the backup the
  ACLs and extended attributes of the files being backed up, and efficiently
  backs up multiply hard-linked files.

* A scheme for organizing Duplicity incremental backups that results in
  multi-level backup chains while being space efficient.

* A scheme and a utility, `cya-collect`, for performing unattended,
  multi-level backups with Duplicity.


Clients periodically (typically, daily) create and upload backups to an
"incoming" directory at the archive server. The archive server collects
these, placing them in apropriate directories in the multi-level backup
hierarchy.

In more detail, on the archive server:

	1) For each client that is backed up, cya-collect is called daily
	   via cron.

	2) It checks if $ROOT/incoming/finished directory exists. If yes,
	   that means a new backup has been uploaded. If no, it exits.

	3) If 'finished' exists, cya-collect uses the information from
	   $ROOT/next_info to hardlink the backup files to apropriate backup
	   sets in $ROOT/backups.  Once done, it removes the 'finished'
	   directory, and creates $ROOT/incomin/next.  It hardlinks the
	   apropriate backup set files (based on current time) into this
	   directory.  It stores which backup set has been hardlinked to
	   $ROOT/next_info.

On the client:

	1) /etc/cya/backup script is run daily via cron. It checks if
	   $ROOT/incoming/next exists on the archive server. If not, it
	   exits, as this means that cya-collect hasn't ran yet and moved a
	   previously created backup to its right place.

	2) Otherwise, it runs duplicity, uploading the result to
	   $ROOT/incoming/next. How duplicity is run can be customized by
	   creating a shell function named 'duplicity' in /etc/cya/backup

	3) Once the backup has finished, $ROOT/incoming/next is moved to
	   $ROOT/incoming/finished. This signals cya-collect that a new
	   backup is ready.



Features to be documented
=========================

These are all in the code (duplicity-ex and duplicity-ex-snap utilities in
lib/), but so far undocumented beyond the actual code:

These do:
* extending duplicity to efficiently store hardlinks
* extending duplicity to store ACLs and xattrs
* self-consistent backups using LVM snapshots
* how to restore using the restore scripts

About

Multi-timescale backup driver for duplicity

Resources

Stars

Watchers

Forks

Packages

No packages published