- Audience
- Requirements
- Installation
- Upgrading to the latest version of Archivematica
- Web UIs
- Source code auto-reloading
- Logs
- Scaling
- Ports
- Tests
- Resetting the environment
- Cleaning up
- Percona tuning
- Instrumentation
- Troubleshooting
This Archivematica environment is based on Docker Compose and it is specifically designed for developers. Compose can be used in a production environment but that is beyond the scope of this recipe. Please read the documentation.
Artefactual developers use Docker Compose on Linux heavily so it's important that you're familiar with it, and some choices in the configuration of this environment break in other operative systems.
System requirements. The following is a sample of memory usage when the environment is initialized in a virtual machine with 8 GB of RAM:
docker stats --all --format "table {{.Name}}\t{{.MemUsage}}"
NAME MEM USAGE / LIMIT
am-archivematica-mcp-client-1 41.3MiB / 7.763GiB
am-archivematica-dashboard-1 145.1MiB / 7.763GiB
am-archivematica-mcp-server-1 39.43MiB / 7.763GiB
am-archivematica-storage-service-1 83.96MiB / 7.763GiB
am-nginx-1 2.715MiB / 7.763GiB
am-elasticsearch-1 900.2MiB / 7.763GiB
am-gearmand-1 3.395MiB / 7.763GiB
am-mysql-1 551.9MiB / 7.763GiB
am-clamavd-1 570MiB / 7.763GiB
Software dependencies: Docker Engine, Docker Compose, git and make. Please use a version of Docker Engine greater than 23.0 which includes Buildkit as the default builder with support for multi-stage builds and a version of Docker Compose greater than 2.17 which supports restarts of dependent services.
It is beyond the scope of this document to explain how these dependencies are installed in your computer.
Follow these instructions to install Docker
Engine in Ubuntu. Docker also provides instructions on how to use it as a
non-root user so you don't have to run the
following docker compose
commands with sudo
. Make sure to read about the
security implications of this change.
For the Elasticsearch container to run properly, you may need to increase the
maximum virtual memory address space vm.max_map_count
to at least 262144
.
This is a configuration setting on the host machine running Docker, not the
container itself.
To make this change run:
sudo sysctl -w vm.max_map_count=262144
To persist this setting, modify /etc/sysctl.conf
and add:
vm.max_map_count=262144
For more information, please consult the Elasticsearch 6.x
documentation.
First, clone this repository this way:
git clone https://github.com/artefactual/archivematica.git --branch qa/1.x --recurse-submodules
This will set up the submodules defined in
https://github.com/artefactual/archivematica/tree/qa/1.x/hack/submodules which
are from the qa/1.x
branch of Archivematica and the qa/0.x
branch of
Archivematica Storage Service. These two branches are the focus of
Archivematica development and pull requests are expected to target them.
Next, run the installation (and all Docker Compose) commands from within the
hack
directory:
cd ./archivematica/hack
Run the following command to create two Docker external volumes:
make create-volumes
These are heavily used in our containers but they are provided in the host machine:
$HOME/.am/am-pipeline-data
- Archivematica's shared directory.$HOME/.am/ss-location-data
- the transfer source location.
Next, build the Docker images:
make build
You may want to rebuild images with this command after updating the
Dockerfile
or the Python requirement files, but it's not necessary to rebuild
the images after changing Python code.
Start the services with:
docker compose up -d
On the first run, the Archivematica services will fail because the databases of the Dashboard and the Storage Service have not been created. To do so, run:
make bootstrap
Be aware that this command drops and recreates both databases, and then runs Django's migrations so you will lose any existing data if you run it again.
Now that the databases have been created, use the following command to restart only the Archivematica services:
make restart-am-services
You should now be able to access Archivematica services through the Web UIs.
Make commands above, and any subsequent calls to it below can be reviewed using the following command:
make help
To upgrade your installation to include the most recent changes in the submodules, use the following commands:
git pull --rebase
git submodule update --init --recursive
docker compose up -d --force-recreate --build
make bootstrap
make restart-am-services
The submodules are not always up to date, i.e. they may not be pointing to the
latest commits of their tracking branches. They can be updated manually using
git pull --rebase
:
cd ./submodules/archivematica-storage-service && git pull --rebase
cd ./submodules/archivematica-sampledata && git pull --rebase
cd ./submodules/archivematica-acceptance-tests && git pull --rebase
Once you're done, run:
docker compose up -d --force-recreate --build
make bootstrap
make restart-am-services
Working with submodules can be a little confusing. GitHub's Working with submodules blog post is a good introduction.
- Archivematica Dashboard: http://127.0.0.1:62080/
- Archivematica Storage Service: http://127.0.0.1:62081/
The default credentials for the Archivematica Dashboard and the Storage Service
are username: test
, password: test
.
Dashboard and Storage Service are both served by Gunicorn. We set up Gunicorn with the reload setting enabled meaning that the Gunicorn workers will be restarted as soon as code changes.
Other components in the stack like the MCPServer
don't offer this option and
they need to be restarted manually, e.g.:
docker compose up -d --force-recreate --no-deps archivematica-mcp-server
If you've added new dependencies or changes the Dockerfile
you should also
add the --build
argument to the previous command in order to ensure that the
container is using the newest image, e.g.:
docker compose up -d --force-recreate --build --no-deps archivematica-mcp-server
In recent versions of Archivematica we've changed the logging configuration so the log events are sent to the standard streams. This is a common practice because it makes much easier to aggregate the logs generated by all the replicas that we may be deploying of our services across the cluster.
Docker Compose aggregates the logs for us so you can see everything from one place. Some examples:
docker compose logs --follow
docker compose logs --follow archivematica-storage-service
docker compose logs --follow nginx archivematica-dashboard
Docker keeps the logs in files using the JSON File logging driver. If you want to clear them, we provide a simple script that can do it for us quickly but it needs root privileges, e.g.:
sudo make flush-logs
With Docker Compose we can run as many containers as we want for a service,
e.g. by default we only provision a single replica of the
archivematica-mcp-client
service but nothing stops you from running more:
docker compose up -d --scale archivematica-mcp-client=3
We still have one service but three containers. Let's verify that the workers are connected to Gearman:
echo workers | socat - tcp:127.0.0.1:62004,shut-none | grep "_v0.0" | awk '{print $2}' - | sort -u
172.19.0.15
172.19.0.16
172.19.0.17
Service | Container port | Host port |
---|---|---|
mysql | tcp/3306 |
tcp/62001 |
elasticsearch | tcp/9200 |
tcp/62002 |
gearman | tcp/4730 |
tcp/62004 |
clamavd | tcp/3310 |
tcp/62006 |
nginx » archivematica-dashboard | tcp/80 |
tcp/62080 |
nginx » archivematica-storage-service | tcp/8000 |
tcp/62081 |
The Makefile
includes many useful targets for testing. List them all with:
make help | grep test-
The following targets use tox
and
pytest
to run the tests using MySQL:
test-all Run all tests.
test-archivematica-common Run Archivematica Common tests.
test-dashboard Run Dashboard tests.
test-mcp-client Run MCPClient tests.
test-mcp-server Run MCPServer tests.
test-storage-service Run Storage Service tests.
tox
sets up separate virtual environments for each target and calls
pytest
to run the tests. Their configurations live in the pyproject.toml
file but you can set the TOXARGS
and PYTEST_ADDOPTS
environment variables to pass command line options to each.
For example you can run all the tests in tox
parallel
mode
and make it extra verbose like this:
env TOXARGS='-vv --parallel' make test-all
The MySQL databases created by pytest
are kept and reused after
each run, but you could force it to recreate them like
this:
env PYTEST_ADDOPTS='--create-db' make test-dashboard
Or you could run only a specific test module using its relative path
in the PYTHONPATH
of the tox
environment like this:
env PYTEST_ADDOPTS=tests/test_reingest_mets.py make test-mcp-client
The sources of the Archivematica Automated User Acceptance Tests
(AMAUATs)
are available inside Docker using volumes so you can edit
them and the changes will apply immediately. They can be executed with the
test-at-behave
Makefile target.
For example, once your Archivematica services start and you can reach the Web
UIs you can execute the black-box
tag of the
AMAUATs in Firefox like this:
make test-at-behave TAGS=black-box BROWSER=Firefox
In many cases, as a tester or a developer, you want to restart all the containers at once and make sure the latest version of the images are built. But also, you don't want to lose your data like the search index or the database. If this is case, run the following command:
docker compose up -d --force-recreate --build
Additionally you may want to delete all the data including the stuff in the external volumes:
make flush
Both snippets can be combined or used separately.
You may need to update the codebase, and for that you can run this command:
git submodule update --init --recursive
The most effective way is:
docker compose down --volumes
It doesn't delete the external volumes described in the Installation section of this document. You have to delete the volumes manually with:
docker volume rm am-pipeline-data
docker volume rm ss-location-data
Optionally you may also want to delete the directories:
rm -rf $HOME/.am/am-pipeline-data $HOME/.am/ss-location-data
To use different settings on the MySQL container, please edit the
etc/mysql/tunning.conf
file and rebuild the container with:
docker compose up -d --force-recreate mysql
Prometheus and Grafana can be used to monitor Archivematica processes.
To run them, reference the docker-compose.instrumentation.yml
file:
docker compose -f docker-compose.yml -f docker-compose.instrumentation.yml up -d
Prometheus will start on 127.0.0.1:9090; Grafana on 127.0.0.1:3000.
Extending the default environment, you can deploy an instance of
Percona Monitoring and Management configured by default to
collect metrics and query analytics data from the mysql
service. To set up the
PMM server and client services alongside all the others you'll need to indicate
two Docker Compose files:
docker compose -f docker-compose.yml -f docker-compose.pmm.yml up -d
To access the PMM server interface, visit http://127.0.0.1:62007:
- Username:
admin
- Password:
admin
We're using Nginx as a proxy. Likely the underlying issue is that
either the Dashboard or the Storage Service died. Run docker compose ps
to confirm the state of their services like this:
docker compose ps --all archivematica-dashboard archivematica-storage-service
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
am-archivematica-dashboard-1 am-archivematica-dashboard "/usr/local/bin/guni…" archivematica-dashboard 11 minutes ago Up 27 seconds 8000/tcp
am-archivematica-storage-service-1 am-archivematica-storage-service "/usr/local/bin/guni…" archivematica-storage-service 11 minutes ago Exited (3) 28 seconds ago
You want to see what's in the logs of the archivematica-storage-service
service, e.g.:
docker compose logs --no-log-prefix --tail 5 archivematica-storage-service
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'storage_service.wsgi'
[2023-06-02 03:53:00 +0000] [9] [INFO] Worker exiting (pid: 9)
[2023-06-02 03:53:00 +0000] [1] [INFO] Shutting down: Master
[2023-06-02 03:53:00 +0000] [1] [INFO] Reason: Worker failed to boot.
Now we know why - I had deleted the wsgi
module. The worker crashed and
Gunicorn gave up. This could happen for example when we're rebasing a branch
and git is not atomically moving things around. But it's fixed now and you want
to give it another shot so we run docker compose up -d
to ensure that all the
services are up again. Next run docker compose ps
to verify that it's all up.
If after running the bootstrap processes and docker compose ps
still shows
that the dashboard and elasticsearch are still down then check the
elasticsearch logs using:
docker compose logs --no-log-prefix --tail 8 elasticsearch
You may see entries as follows:
[2023-06-02T03:50:27,970][INFO ][o.e.b.BootstrapChecks ] [am-node] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2023-06-02T03:50:28,006][INFO ][o.e.n.Node ] [am-node] stopping ...
[2023-06-02T03:50:28,077][INFO ][o.e.n.Node ] [am-node] stopped
[2023-06-02T03:50:28,077][INFO ][o.e.n.Node ] [am-node] closing ...
[2023-06-02T03:50:28,088][INFO ][o.e.n.Node ] [am-node] closed
[2023-06-02T03:50:28,090][INFO ][o.e.x.m.j.p.NativeController] [am-node] Native controller process has stopped - no new native processes can be started
This indicates that you may need to increase the virtual memory available to Elasticsearch, as discussed in the section Elasticsearch container above.
In some cases the pmm_client
service fails to start reporting the following
error:
[main] app already is running, exiting
You'll need to fully recreate the container to make it work:
docker compose -f docker-compose.yml -f docker-compose.pmm.yml rm pmm_client
docker compose -f docker-compose.yml -f docker-compose.pmm.yml up -d
You've read this far but you haven't yet figured out why your development environment is not working? Here are some tips:
- Does your system meet the requirements? Some services like Elasticsearch or ClamAV need a lot of memory!
- Make sure that you've checked out the latest commit of this repository.
- Make sure that your repositories under
/hack/submodules
(submodules) are up to date. If you are working off your own branches, make sure they are not outdated. Rebase often! - Look for open/closed issues that may relate to your problem!
- Get support.