Easy-to-use backup and archive tool.
Arkiv is designed to backup local files and MySQL databases, and archive them on Amazon S3 and Amazon Glacier.
Backup files are removed (locally and from Amazon S3) after defined delays.
Arkiv could backup your data on a daily or an hourly basis (you can choose which day and/or which hours it will be launched).
It is written in pure shell, so it can be used on any Unix/Linux machine.
Arkiv was created by Amaury Bouchard and is open-source software.
- Generate backup data from local files and databases.
- Store data on the local drive for a few days/weeks, in order to be able to restore fresh data very quickly.
- Store data on Amazon S3 for a few weeks/months, if you need to restore them easily.
- Store data on Amazon Glacier for ever. It's an incredibly cheap storage that should be used instead of Amazon S3 for long-term conservancy.
Data are deleted from the local drive and Amazon S3 when the configured delays are reached.
If your data are backed up multiple time per day (not just every day), it's possible to define a fine-grained purge of the files stored on the local drive and on Amazon S3.
For example, it's possible to:
- remove half the backups after two days
- keep only 2 backups per day after 2 weeks
- keep 1 backup per day after 3 weeks
- remove all files after 2 months
The same kind of configuration could be defined for Amazon S3 archives.
Starting
- Arkiv is launched every day (or every hour) by Crontab.
- It creates a directory dedicated to the backups of the day (or the backups of the hour).
Backup
- Each configured path is
tar
'ed and compressed, and the result is stored in the dedicated directory. - If MySQL backups are configured, the needed databases are dumped and compressed, in a sub-directory.
- If encryption is configured, the backup files are encrypted.
- Checksums are computed for all the generated files. These checksums are useful to verify that the files are not corrupted after being transfered over a network.
Archiving
- If Amazon Glacier is configured, all the generated backup files (not the checksums file) are sent to Amazon Glacier. For each one of them, a JSON file is created with the response's content; these files are important, because they contain the archiveId needed to restore the file.
- If Amazon S3 is configured, the whole directory (backup files + checksums file + Amazon Glacier JSON files) is copied to Amazon S3.
Purge
- After a configured delay, backup files are removed from the local disk drive.
- If Amazon S3 is configured, all backup files are removed from Amazon S3 after a configured delay. The checksums file and the Amazon Glacier JSON files are not removed, because they are needed to restore data from Amazon Glacier and check their integrity.
Several tools are needed by Arkiv to work correctly. They are usually installed by default on every Unix/Linux distributions.
- A not-so-old
bash
Shell interpreter located on/bin/bash
(mandatory) tar
for files concatenation (mandatory)gzip
,bzip2
,xz
orzstd
for compression (at least one)openssl
for encryption (optional)sha256sum
for checksums computation (mandatory)tput
for ANSI text formatting (optional: can be manually deactivated; automatically deactivated if not installed)
To install these tools on Ubuntu:
# apt-get install tar gzip bzip2 xz-utils openssl coreutils ncurses-bin
If you want to encrypt the generated backup files (stored locally as well as the ones archived on Amazon S3 and Amazon Glacier), you need to create a symmetric encryption key.
Use this command to do it (you can adapt the destination path):
# openssl rand 32 -out ~/.ssh/symkey.bin
If you want to backup MySQL databases, you have to install mysqldump
or xtrabackup
.
To install mysqldump
on Ubuntu:
# apt-get install mysql-client
To install xtrabackup
on Ubuntu (see documentation):
# wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb
# dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb
# apt-get update
# apt-get install percona-xtrabackup-24
If you want to archive the generated backup files on Amazon S3/Glacier, you have to do these things:
- Create a dedicated bucket on Amazon S3.
- If you want to archive on Amazon Glacier, create a dedicated vault in the same datacenter.
- Create an IAM user with read-write access to this bucket and this vault (if needed).
- Install the AWS-CLI program and configure it.
Install AWS-CLI on Ubuntu:
# apt-get install awscli
Configure the program (you will be asked for the AWS user's access key and secret key, and the used datacenter):
# aws configure
Get the last version:
# wget https://github.com/Digicreon/Arkiv/archive/refs/tags/1.0.0.zip -O Arkiv-1.0.0.zip
# unzip Arkiv-1.0.0.zip
or
# wget https://github.com/Digicreon/Arkiv/archive/refs/tags/1.0.0.tar.gz -O Arkiv-1.0.0.tar.gz
# unzip Arkiv-1.0.0.tar.gz
# cd Arkiv-1.0.0
# ./arkiv config
Some questions will be asked about:
- If you want a simple installation (one backup per day, everyday, at midnight).
- The local machine's name (will be used as a subdirectory of the S3 bucket).
- The used compression type.
- If you want to encrypt the generated backup files.
- Which files must be backed up.
- Everything about MySQL backup (SQL or binary backup, which databases, host/login/password for the connection).
- Where to store the compressed files resulting of the backup.
- Where to archive data on Amazon S3 and Amazon Glacier (if you want to).
- When to purge files (locally and on Amazon S3).
Finally, the program will offer you to add the Arkiv execution to the user's crontab.
Arkiv is licensed under the terms of the MIT License, which is a permissive open-source free software license.
More in the file COPYING
.
You can use the Amazon Web Services Calculator to estimate the cost depending of your usage.
You can use one of the four common compression tools (gzip
, bzip2
, xz
, zstd
).
Usually, you can follow these guidelines:
- Use
zstd
if you want the best compression and decompression speed. - Use
xz
if you want the best compression ratio. - Use
gzip
orbzip2
if you want the best portability (xz
andzstd
are younger and less widespread).
Here are some helpful links:
- Gzip vs Bzip2 vs XZ Performance Comparison
- Quick Benchmark: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO
- Zstandard presentation and benchmarks
The default usage is zstd
, because it has the best compression/speed ratio.
I choose simple mode configuration (one backup per day, every day). Why is there a directory called "00:00" in the backup directory of the day?
This directory means that your Arkiv backup process is launched at midnight.
You may think that the backed up data should have been stored directly in the directory of the day, without a sub-directory for the hour (because there is only one backup per day). But if someday you'd want to change the configuration and do many backups per day, Arkiv would have trouble to manage purges.
You can add the path to the configuration file as a parameter of the program on the command line.
To generate the configuration file:
# ./arkiv config --config=/path/to/config/file
or
# ./arkiv config -c /path/to/config/file
To launch Arkiv:
# ./arkiv exec --config=/path/to/config/file
or
# ./arkiv exec -c /path/to/config/file
You can modify the Crontab to add the path too.
It is not possible to encrypt data with a public key; OpenSSL's PKI isn't designed to encrypt large data. Encryption is done using an 256 bits AES algorithm, which is symmetrical.
To ensure that only the owner of a private key would be able to decrypt the data, without transfering this key, you have to encrypt the symmetric key using the public key, and then send the encrypted key to the private key's owner.
Here are the steps to do it (key files are usually located in ~/.ssh/
).
Create the symmetric key:
# openssl rand 32 -out symkey.bin
Convert the public and private keys to PEM format (usually people have keys in RSA format, using them with SSH):
# openssl rsa -in id_rsa -outform pem -out id_rsa.pem
# openssl rsa -in id_rsa -pubout -outform pem -out id_rsa.pub.pem
Encrypt the symmetric key with the public key:
# openssl rsautl -encrypt -inkey id_rsa.pub.pem -pubin -in symkey.bin -out symkey.bin.encrypt
To decrypt the encrypted symmetric key using the private key:
# openssl rsautl -decrypt -inkey id_rsa.pem -in symkey.bin.encrypt -out symkey.bin
To decrypt the data file:
# openssl enc -d -aes-256-cbc -in data.tgz.encrypt -out data.tgz -pass file:symkey.bin
When you send a file to Amazon Glacier, you get back an archiveId (file's unique identifier). Arkiv take this information and write it down in a file; then this file is copied to Amazon S3. If the archiveId is lost, you will not be able to get the file back from Amazon Glacier. An archived file that you can't restore is useless. Even if it's possible to get the list of archived files from Amazon Glacier, it's a slow process; it's more flexible to store archive identifiers in Amazon S3 (and the cost to store them is insignificant).
Arkiv provides several ways to exclude content from archives.
First of all, it follows the CACHEDIR.TAG standard. If a directory contains a CACHEDIR.TAG
file, it will be added to the archive, as well as the CACHEDIR.TAG
file, but not its other files and subdirectories.
If you want to exclude the content of a directory in a way similar of the previous one, but you don't want to create a CACHEDIR.TAG
file (to avoid exclusion of the directory by other programs), you can create an empty .arkiv-exclude
file in the directory. The directory and the .arkiv-exclude
will be added to the archive (to keep track of the folder, with the information of the subcontent exclusion), but not the other files and subdirectories contained in the given directory.
If you want to exclude specific files of a directory, you can create a .arkiv-ignore
file in the directory, and write a list of exclusion patterns into it. These patterns will be used to exclude files and subdirectories directly stored in the given directory.
If you create a .arkiv-ignore-recursive
file in a directory, patterns will be read from this file to define recursive exclusions in the given directory and all its subdirectories.
Yes, you just have to add some options on the command line:
--no-stdout
(or-o
) to avoid output on STDOUT--no-stderr
(or-e
) to avoid output on STDERR
You can use these options separately or together.
You can use a dedicated parameter:
# ./arkiv exec --log=/path/to/log/file
or
# ./arkiv exec -l /path/to/log/file
It will not disable output on the terminal. You can use the options --no-stdout
and --no-stderr
for that (see previous answer).
Add the option --syslog
(or -s
) on the command line or in the Crontab command.
Add the option --no-ansi
(or -n
) on the command line or in the Crontab command. It will act on terminal output as well as log file (see --log
option above) and syslog (see --syslog
option above).
Unlike more
and tail
, less
doesn't interpret ANSI text formatting commands (bold, color, etc.) by default.
To enable it, you have to use the option -r
or -R
.
Arkiv could generate two kinds of database backups:
- SQL backups created using
mysqldump
. - Binary backups using
xtrabackup
.
There is two types of binary backups:
- Full backups; the server's files are entirely copied.
- Incremental backups; only the data modified since the last backup (full or incremental) are copied.
You must do a full backup before performing any incremental backup.
If you choose SQL backups (using mysqldump
), Arkiv can manage any table engine supported by MySQL, MariaDB and Percona Server.
If you choose binary backups (using xtrabackup
), Arkiv can handle:
- MySQL (5.1 and above) or MariaDB, with InnoDB, MyISAM and XtraDB tables.
- Percona Server with XtraDB tables.
Note that MyISAM tables can't be incrementally backed up. They are copied entirely each time an incremental backup is performed.
No. Binary backups are done using xtrabackup --backup
. The xtrabackup --prepare
step is not done to save time and space. You will have to do it when you want to restore a database (see below).
You will have to create two different configuration files and add Arkiv in Crontab twice: once for the full backup (everyday at midnight for example), and once for the incremental backups (every hours except midnight).
You need both executions to use the same LSN file. It will be written by the full backup, and read and updated by each incremental backups.
The same process could be used with any other frequency (for example: full backups once a week and incremental backups every other days).
Arkiv generates one SQL file per database. You have to extract the wanted file and process it in your database server:
# unxz /path/to/database_sql/database.sql.xz
# mysql -u username -p < /path/to/database_sql/database.sql
To restore the database, you first need to extract the data:
# tar xJf /path/to/database_data.tar.xz
or
# tar xjf /path/to/database_data.tar.bz2
or
# tar xzf /path/to/database_data.tar.gz
Then you must prepare the backup:
# xtrabackup --prepare --target-dir=/path/to/database_data
Please note that the MySQL server must be shut down, and the 'datadir' directory (usually /var/lib/mysql
) must be empty. On Ubuntu:
# service mysql stop
# rm -rf /var/lib/mysql/*
Then you can restore the data:
# xtrabackup --copy-back --target-dir=/path/to/database_data
Files' ownership must be given back to the MySQL user (usually mysql
):
# chown -R mysql:mysql /var/lib/mysql
Finally you can restart the MySQL daemon:
# service mysql start
Let's say you have a full backup (located in /full/database_data
) and three incremental backups (located in /inc1/database_data
, /inc2/database_data
and /inc3/database_data
), and you have already extracted the backed up files (see previous answer).
First, you must prepare the full backup with the additional --apply-log-only
option:
# xtrabackup --prepare --apply-log-only --target-dir=/full/database_data
And then you prepare using all incremental backups in their creation order, except the last one:
# xtrabackup --prepare --apply-log-only --target-dir=/full/database_data --incremental-dir=/inc1/database_data
# xtrabackup --prepare --apply-log-only --target-dir=/full/database_data --incremental-dir=/inc2/database_data
Data preparation of the last incremental backup is done without the --apply-log-only
option:
# xtrabackup --prepare --target-dir=/full/database_data --incremental-dir=/inc3/database_data
Once every backups have been merged, the process is the same than for a full backup:
# service mysql stop
# rm -rf /var/lib/mysql/*
# xtrabackup --copy-back --target-dir=/path/to/database_data
# chown -R mysql:mysql /var/lib/mysql
# service mysql start
On simple mode (one backup per day, every day at midnight), how to set up Arkiv to be executed at another time than midnight?
You just have to edit the configuration file of the user's Cron table:
# crontab -e
See the previous answer. You just have to add these scripts before and/or after the Arkiv program in the Cron table.
No, it's not possible.
I want to have colors in the Arkiv log file when it's launched from Crontab, as well as when it's launch from the command line
The problem comes from the Crontab environment, which is very minimal.
You have to set the TERM
environment variable from the Crontab. It is also a good idea to define the MAILTO
and PATH
variables.
Edit the Crontab:
# crontab -e
And add these three lines at its beginning:
TERM=xterm
MAILTO=your.email@domain.com
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Add a MAILTO
environment variable at the beginning of your Crontab. See the previous answer.
Because the read
buitin command has a -s
parameter for silent input (used for encryption passphrase and MySQL password input without showing them), unavailable on dash
or zsh
(for example).
Yes indeed. Both of them wants to help people to backup files and databases, and archive data in a secure place.
But Arkiv is different in several ways:
- It can manage hourly backups.
- It can transfer data on Amazon Glacier for long-term archiving.
- It can manage complex purge policies.
- The configuration process is simpler (you answer to questions).
- Written in pure shell, it doesn't need a Perl interpreter.
On the other hand, Backup-Manager is able to transfer to remote destinations by SCP or FTP, and to burn data on CD/DVD.