This is my script for unattended host backups using Duplicity that others might find useful. It is configurable through a YAML file, but opinionated in some ways:
- Backups go to S3 using Standard-Infrequent Access storage class to save money.
- Encryption and signing requires a GnuPG keypair with no passphrase. The key should be protected by filesystem permissions anyway so a passphrase just adds unnecessary complexity.
- Time determines the interval between full backups.
- Purging old backups happens automatically at the end (unless overridden). The script keeps the last N full backups.
Run duplicity-unattended --help
to see all options or just look at the code.
duplicity-unattended
: Script that runs unattended backups and purges stale backups.systemd/
: Directory containing sample systemd unit files you can customize to run the script periodically.cfn/host-bucket.yaml
: CloudFormation template to set up an S3 bucket and IAM permissions for a new host.cfn/backup-monitor
: CloudFormation (SAM) template and Lambda function to notify you if backups stop working.terraform-gcp
: Terraform template to set up remote backups in Google Cloud Storage. (Sets up GCS folder and Service Account.)
You can use the script without systemd or CloudFormation if you prefer. They all work independently.
Here are the steps I generally follow to set up backups on a new host.
I use separate keys, buckets, and AWS credentials so the compromise of any host doesn't affect others.
First, create an S3 bucket and IAM user/group/policy with read-write access to it. The included cfn/host-bucket.yaml
CloudFormation template can do this for you automatically. To apply it:
- Go to CloudFormation in the AWS console and click
Create Stack
. - Select the option to upload a template to S3 and pick the
cfn/host-bucket.yaml
template. - Fill in the stack name and bucket name. I suggest including the hostname in both for easy identification.
- Accept remaining defaults and acknowledge the IAM resource creation warning.
- Wait for stack setup to complete. If it fails, it's likely the S3 bucket name isn't unique. Delete the stack and try again with a different name.
- Go to IAM in the AWS console and click on the new user. The user name is prefixed with the stack name so you can identify it that way.
- Go to the
Security credentials
tab and clickCreate access key
. - Copy the generated access key ID and secret key. You'll need them later.
Alternatively, you can create the S3 bucket and IAM resources manually. Here are the general steps. Modify as you see fit.
- Create the S3 bucket. Default settings are fine.
- Create IAM policy with the following permissions:
Replace
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::<bucket>/*", "arn:aws:s3:::<bucket>" ] } ] }
<bucket>
with the bucket name. - Create IAM group with the same name as the policy and assign the policy to it.
- Create IAM user for programmatic access. Add the user to the group. Don't forget to copy the access key ID and secret access key at the end of the wizard.
- Create a file on the host containing the AWS credentials.
Replace
[Credentials] aws_access_key_id = <access_key_id> aws_secret_access_key = <secret_key>
<access_key_id>
and<secret_key>
with the IAM user credentials. Put it in a location appropriate for the backup user such as/etc/duplicity-unattended/aws_credentials
or~/.duplicity-unattended/aws_credentials
. - Make sure only the backup user can access the credentials file.
Change ownership if needed.
chmod 600 aws_credentials
Much of this is based on https://systemoverlord.com/2019/09/23/backing-up-to-google-cloud-storage-with-duplicity-and-service-accounts.html
- Create a Google Cloud account at cloud.google.com
- Log into the web console
- Create a project that will house your backups, and make yourself a "Storage Admin" on that project.
The terraform included in this repository will create everything you need in your GCP project, including the cloud storage bucket, and all the required permissions for your host machine.
You should modify the contents of terraform.tfvars
to match your needs before running the following:
cd ./terraform
terraform init
terraform apply
This will output a message from terraform about success/failure, and the path to your service account credentials file. You'll need this path to finish your host setup later on.
- Install dependencies:
- Duplicity
- GnuPG
- Python 3
- PyYAML for Python 3
- Create new RSA 4096 keypair as the user who will perform the backups. If you're backing up system directories, this probably needs to be root. Do NOT set a passphrase. Leave it blank.
gpg --full-generate-key --pinentry loopback
- Make an off-host backup of the keypair in a secure location. I use my LastPass vault for this. Don't skip this step or you'll be very sad when you realize the keys perished alongside the rest of your data, rendering your backups useless.
gpg --list-keys # to get the key ID gpg --armor --output pubkey.gpg --export <key_id> gpg --armor --output privkey.gpg --export-secret-key <key_id>
- Delete the exported key files from the filesystem once they're secure.
- Install gcs-oauth2-boto-plugin
- Install the gcloud sdk (instructions)
- Run
gcloud init
- You will be prompted to log in to your Google Cloud account, which authenticates your machine so that we can run the Terraform
- Run
gsutil config -e
(Enter the path to your service account credentials file when prompted) - The gsutil command given above will create a boto config file at
~/.boto
- Create the file
~/.config/boto/plugins/gcs.py
- Put the following contents in that file:
import gcs_oauth2_boto_plugin
- Put the following at the bottom of your
~/.boto
file
[Plugin]
plugin_directory = /home/{YOUR_USERNAME}/.config/boto/plugins
- Copy the
duplicity-unattended
script to abin
directory and make sure it's runnable.I usually clone the repo tochmod +x duplicity-unattended
/usr/local/share
and add a symlink inusr/local/bin
. - Copy the sample
config.yaml
file to the same directory as the AWS credentials file. (Or you can put it somewhere else. Doesn't matter.) - Customize the
config.yaml
file for the host. - Do a dry-run backup as the backup user to validate most of the configuration:
Replace
duplicity-unattended --config <config_file> --dry-run
<config_file>
with the path to the YAML config file. Among other things, this will tell you how much would be backed up. - Do an initial backup as the backup user to make sure everything really works:
duplicity-unattended --config <config_file>
How you schedule backups depends on your OS. I use systemd timers for this. See the systemd
directory in this repository for sample unit files you can customize. You'll probably need to change User
, Group
, and ExecStart
to match the user who performs the backups and the location of the duplicity-unattended
script, respectively.
On Arch Linux and similar distros, drop these files into /etc/systemd/system
and then enable and start the timer with:
sudo systemctl enable duplicity-unattended.timer
sudo systemctl start duplicity-unattended.timer
Make sure the timer is running:
sudo systemctl status duplicity-unattended.timer
And then run the backup once manually and check the output:
sudo systemctl start duplicity-unattended.service
sudo journalctl -u duplicity-unattended.service
You're done! Enjoy your backups.
How do make sure backups keep working in the future? You can set up systemd to email you if something goes wrong, but I prefer an independent mechanism. The cfn/backup-monitor
directory contains a CloudFormation template (SAM template, actually) with a Lambda function that monitors a bucket for new backups and emails you if no recent backups have occurred. To set it up for a new host/bucket, follow these steps:
- If you have not used AWS Simple Email Service (SES) before, follow the instructions to verify the sender and recipient email addresses. See the overview documentation for more information.
- Go to duplicity-unattended-monitor in the AWS Serverless Application Repository and click the
Deploy
button. - Review the template. (You wouldn't deploy a CloudFormation template into your AWS account without knowing what it does first, would you?)
- Change the application/stack name. I suggest a name that includes the host or bucket for easy identification.
- Fill in the remaining function parameters. Make sure the email addresses exactly match the ones you verified in SES.
- Click
Deploy
and wait for AWS to finish creating all the resources.
Now let's test it.
- Click on the function link under
Resources
. This will take you to the Lambda console for the function. - Click the
Test
button in the upper-right. - Create a new test event with the following content:
Give it a name like
{"testEmail": true}
BackupMonitorTest
and clickCreate
. - Now you should see the new named event next to the
Test
button. Click theTest
button again.
If all goes well, you will get an email with a summary of the most recent backups found in the bucket.
From now on, the function will run once a day and email you only when there have been no recent backups for the number of days you specified. The function will look for recent backups in any S3 "folder" that contains at least one backup set from any time in the past. You can deploy additional stacks for each bucket you want to monitor.
If you prefer to deploy the CloudFormation template directly from source code instead of from the Serverless Application Repository, you can. The steps are roughly as follows:
- Install Pipenv for Python 3 if you don't already have it.
- From the source repo directory, install the AWS SAM CLI into a virtual environment:
pipenv install --dev
- Change to the
cfn/backup-monitor
directory. - Set up your AWS CLI credentials so SAM can read them (e.g. using
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables). - Run the SAM command to package the CloudFormation template and upload the Lambda function to S3:
where
pipenv run sam package --s3-bucket <code_bucket> --output-template-file packaged.yaml
<code_bucket>
is an S3 bucket to which the AWS CLI user has write access. - You can now use the CloudFormation AWS console or the AWS CLI to deploy the
packaged.yaml
stack template that SAM just created.
Invoke duplicity
directly to restore from a backup. The general procedure is as follows:
- If restoring on a new host, import the GPG keypair from its secure backup location:
gpg --import privkey.gpg
- List the keys to get the key ID:
Make a note of the ID (long hexadecimal number). You'll need it when you run the
gpg --list-keys
duplicity
command later. - If you don't have a copy of the original AWS credentials file (e.g. it perished along with your data), create a new one. You can create a new access key from the IAM console following the same procedure as described above for setting up a new host. Don't forget to deactivate the old access key in the IAM console if you no longer need it.
- Point Duplicity to the AWS credentials file by setting the
BOTO_CONFIG
environment variable. Inbash
, you'd run:Replaceexport BOTO_CONFIG=<aws_credentials_file>
<aws_credentials_file>
with the path to the file - Run
duplicity
from the command line to restore each source directory. You can browse the source directories by looking inside the S3 bucket in the AWS console. Here's a basic working restore command that restores a source directory to a new target directory calledrestored
:Replacemkdir restored duplicity --encrypt-sign-key <key_id> s3+http://<bucket>/<source_dir> restored
<key_id>
with the GPG key ID,<bucket>
with the S3 bucket name, and<source_dir>
with the source directory name (S3 key prefix). You might be asked to provide a passphrase during the restore. Just hit ENTER.