Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker-compose build system #8709

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2e4609b
initial take at docker-compose build
May 17, 2022
5b297f7
minor documentation
May 18, 2022
422bb7e
.m2 folder isn't setup by default so this would break
May 18, 2022
393fdf0
switch to Ubuntu 22.04 LTS
May 19, 2022
cb88c10
remove healthcheck from Docker which was breaking Rserve
May 19, 2022
73c0051
add architecture flag to swap between amd64 and arm64 CPU architectures
May 19, 2022
8497eb5
refactor code to root dir for build
May 19, 2022
c5edd00
disable cache testing again
May 19, 2022
61d7d9e
cleanup gitignore with refactor
May 19, 2022
8067fc9
multi-thread building of R libraries
May 23, 2022
324c13b
fix maven build
Jun 14, 2022
80771b8
rev upstream Solr version
Aug 29, 2022
3a399eb
rev traefik version
Aug 30, 2022
b4deddc
look for Debian package updates in Solr
Aug 30, 2022
0ecc128
rev Payara version
Aug 30, 2022
403c037
swap to latest Payara version
Aug 30, 2022
3605baf
delete cached Maven dependencies
Aug 30, 2022
6cc5adc
look for and apply Debian updates
Aug 30, 2022
436d7f9
Merge branch 'develop' into docker-compose
Jan 3, 2023
c22719f
add traefik container in maven
Jan 12, 2023
ed2d910
add solr container in maven
Jan 12, 2023
a5f3fcf
add seaweedfs container using maven
Jan 12, 2023
a87bfca
add rserve container using maven
Jan 12, 2023
169422e
add postgresql container using maven
Jan 12, 2023
6e5d3f4
update docker-compose with prebuilt images from maven
Jan 12, 2023
aeae7c6
update build script
Jan 12, 2023
ae526a9
add dataverse container using maven
Jan 12, 2023
1d88ab9
Merge branch 'develop' of https://github.com/IQSS/dataverse.git into …
Jan 12, 2023
134695d
delete some unnecessary code to simplify
Jan 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/.git/
/.github/
/conf/docker-compose/postgres-bind/
/conf/docker-compose/solr-bind/
/conf/docker-compose/seaweedfs-bind/
/conf/docker-compose/dataverse-docroot-bind/
/conf/docker-compose/dataverse-logos-bind/
42 changes: 42 additions & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# timezone
# https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
TZ="America/Denver"

# dataverse service
HOST_DNS_ADDRESS=dataverse
GLASSFISH_USER=dataverse
GLASSFISH_PASSWORD=secret
GLASSFISH_ADMIN_USER=admin
GLASSFISH_ADMIN_PASSWORD=secret
ADMIN_EMAIL=noreply@mydomain.com
MAIL_SERVER=localhost
POSTGRES_ADMIN_PASSWORD=secret
POSTGRES_SERVER=postgres
POSTGRES_PORT=5432
POSTGRES_DATABASE=dataverse
POSTGRES_PASSWORD=secret
POSTGRES_USER=dataverse
SOLR_LOCATION=solr:8983
RSERVE_HOST=rserve
RSERVE_PORT=6311
# the rserve credentials are hardcoded in the Dockerfile, edit both if you want to change them
RSERVE_USER=rserve
RSERVE_PASSWORD=rserve

# disable DOI validation checks, true or false, set this to true for your development environment
DISABLE_DOI=true

# exclude emails from exports
# https://guides.dataverse.org/en/latest/installation/config.html#excludeemailfromexport
EXCLUDE_EMAIL_EXPORTS=true

# s3 keys
S3_ACCESS_KEY=secret
S3_SECRET_KEY=secret

# fully qualified domain name (FQDN) and site URL
# recommend keeping this as dataverse because it's used internally for routing within Docker
# if you change this, s3 storage will break
DATAVERSE_FQDN=dataverse
# make sure to escape characters like :
DATAVERSE_SITE_URL=http\://localhost
89 changes: 88 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1 +1,88 @@
# See http://guides.dataverse.org/en/latest/developers/containers.html
# https://hub.docker.com/_/ubuntu
FROM ubuntu:22.04

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc python3-dev tzdata nano dos2unix curl wget openjdk-11-jdk maven unzip jq imagemagick python3 python3-pip python3-psycopg2 wait-for-it ca-certificates && \
apt-get -y upgrade && \
rm -rf /var/lib/apt/lists/*

WORKDIR /

RUN useradd --create-home --shell /bin/bash dataverse

# https://guides.dataverse.org/en/5.8/installation/prerequisites.html
RUN wget https://s3-eu-west-1.amazonaws.com/payara.fish/Payara+Downloads/5.2022.3/payara-5.2022.3.zip

RUN unzip payara-5.2022.3.zip && \
mv payara5 /usr/local && \
rm payara-5.2022.3.zip

RUN chown -R root:root /usr/local/payara5 && \
chown dataverse /usr/local/payara5/glassfish/lib && \
chown -R dataverse:dataverse /usr/local/payara5/glassfish/domains/domain1

# ENV JAVA_HOME "/usr/lib/jvm/java-11-openjdk-${ARCHITECTURE}"
# RUN export JAVA_HOME="$(dirname $(dirname $(readlink -f $(which java))))"

# install Counter Processor
# https://guides.dataverse.org/en/latest/installation/prerequisites.html#counter-processor
RUN cd /usr/local && \
wget https://github.com/CDLUC3/counter-processor/archive/refs/tags/v0.1.04.tar.gz && \
tar xvfz v0.1.04.tar.gz && \
rm v0.1.04.tar.gz && \
cd counter-processor-0.1.04 && \
pip3 install -r requirements.txt

RUN useradd --create-home --shell /bin/bash counter && \
chown -R counter:counter /usr/local/counter-processor-0.1.04

# install awscli
RUN pip3 install --no-cache-dir awscli

# switch to non-root user as this is more secure
USER dataverse
WORKDIR /

RUN mkdir -p /home/dataverse/.aws/
COPY --chown=dataverse:dataverse ./conf/docker-compose/dataverse/config /home/dataverse/.aws/config
COPY --chown=dataverse:dataverse ./conf/docker-compose/dataverse/credentials /home/dataverse/.aws/credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you want to make the credentials MicroProfile config options, so they're not stored in plain text? this can be done in 5.10+


RUN cp -R /home/dataverse/.aws/ /usr/local/payara5/glassfish/domains/domain1/

# if you want to speed up the Maven build you can copy over
# cached packages here
# COPY --chown=dataverse:dataverse ./conf/docker-compose/dataverse/.m2/ /home/dataverse/.m2/

# copy over sourcecode and build files needed to compile the .war
# as well as installer files
COPY --chown=dataverse:dataverse pom.xml /dataverse/
COPY --chown=dataverse:dataverse src /dataverse/src/
COPY --chown=dataverse:dataverse modules /dataverse/modules/
COPY --chown=dataverse:dataverse scripts /dataverse/scripts/
COPY --chown=dataverse:dataverse conf/jhove/ /dataverse/conf/jhove/
COPY --chown=dataverse:dataverse local_lib /dataverse/local_lib/

# this likely isn't needed on Linux but was needed on a Windows build
RUN find /dataverse -type f -print0 | xargs -0 -n 1 -P 4 dos2unix

# this can take some time to download all the dependencies
RUN cd /dataverse/ && \
export dpkgArch="$(dpkg --print-architecture)" && \
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" && \
mvn package -DskipTests --no-transfer-progress

# delete the cached dependencies so we don't get any inaccurate "false flags" on container scanning for security issues
RUN rm -rf ~/.m2/

USER root
COPY --chown=dataverse:dataverse ./conf/docker-compose/dataverse/startup.sh /startup.sh
RUN chmod +x /startup.sh && dos2unix /startup.sh

USER dataverse
CMD ["wait-for-it", "postgres:5432", "--", "/startup.sh"]

# helpful for debugging purposes to just start up the container
# CMD ["tail", "-f", "/dev/null"]
9 changes: 9 additions & 0 deletions conf/docker-compose/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/solr/*.xml
/dataverse/.m2/
/postgres-bind/
/solr-bind/
/seaweedfs-bind/
/dataverse-docroot-bind/
/dataverse-logos-bind/
/traefik/traefik.key
/traefik/traefik.crt
98 changes: 98 additions & 0 deletions conf/docker-compose/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# docker-compose version of Dataverse

## Requirements

* [docker-compose](https://docs.docker.com/compose/)
* [Docker](https://docker.com) (or some other supported container engine)
* [Maven](https://maven.apache.org/)

## Setup

Edit the `.env` file as needed. Make sure to properly secure it in terms of file permissions.

Edit `./seaweedfs/config.json` and enter your [credential keys](https://github.com/chrislusf/seaweedfs/wiki/Amazon-S3-API#static-configuration) for s3 storage.

If you're running locally and don't have a key, you'll
need to generate it yourself with something like Git Bash. Make sure
[Posix to Windows path conversion](https://github.com/git-for-windows/git/issues/577#issuecomment-166118846) doesn't
take place with the forward slashes using `MSYS_NO_PATHCONV=1` if you're on Windows.

```shell
MSYS_NO_PATHCONV=1 openssl req -x509 -nodes -days 4096 -newkey rsa:4096 -out traefik.crt -keyout traefik.key -subj "/C=US/ST=New Mexico/L=ABQ/O=Local/CN=127.0.0.1" -addext "subjectAltName = IP:127.0.0.1"
```

Or grab your public/private keys from your sysadmin or provider and rename them to `traefik.key` and `traefik.crt`.

Then copy the `traefik.key` and `traefik.crt` files into the `traefik` folder.

## Building

Run `build-containers.sh`. This will copy a few files and setup the build environment before
running maven builds for each of the container services.

Pull and build the Docker containers

```shell
# this uses Compose v2, if you're on an older version you may
# need to change this call to docker-compose
docker compose pull
docker compose build
```

## Deploying

```shell
docker-compose up -d
```

Note that this can take a couple minutes to start up. Wait until it shows `healthy` as the status.

For the bind mounts (see `docker-compose.yml`) you may need to set the permissions
on those folders `*-bind` so they can be written from within the containers. Alternatively,
you can create local users or do UID/GID mappings.

```shell
docker ps
```

Then go to the following URL in your browser:

[https://localhost](https://localhost)

Default credentials for login are:

* username: `dataverseAdmin`
* password: `admin`

Make sure to change this password right away.

## How It Works

* Builds a copy of the `.war` deployable code from source
* Stands up various services and pieces needed:
* seaweedfs - for s3 storage
* traefik - reverse proxy, HTTP is re-routed automatically to HTTPS
* postgres - database backend
* solr - text indexing database
* rserve - R server for running R commands
* dataverse - the main Dataverse web application
* sets up two storage options, one is the default `<id>=files` for local storage
and the other is `<id>=s3`for s3 storage

## Uninstall / Teardown

```shell
docker-compose down -v
```

## Development References

There are many community led efforts to utilize containers, Kubernetes, and more to help automate
and setup Dataverse.

* [https://github.com/fzappa/rocky-dataverse/blob/main/rocky-dataverse.sh](https://github.com/fzappa/rocky-dataverse/blob/main/rocky-dataverse.sh)
* [https://github.com/IQSS/dataverse/tree/develop/conf/docker-aio](https://github.com/IQSS/dataverse/tree/develop/conf/docker-aio)
* [https://github.com/gdcc/dataverse-kubernetes/blob/develop/docker-compose.yaml](https://github.com/gdcc/dataverse-kubernetes/blob/develop/docker-compose.yaml)
* [https://github.com/gdcc/dataverse-kubernetes](https://github.com/gdcc/dataverse-kubernetes)
* [https://github.com/EOSC-synergy/dataverse-kubernetes](https://github.com/EOSC-synergy/dataverse-kubernetes)
* [https://github.com/IQSS/dataverse-docker](https://github.com/IQSS/dataverse-docker)
29 changes: 29 additions & 0 deletions conf/docker-compose/build-containers.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash

# Solr XML schema files
cp ../solr/8.11.1/*.xml ../../modules/container-solr/src/main/docker/

# go back to git root directory
cd ../../

# prep Solr beforehand so it has the appropriate permissions
# 8983 is the UID hard-coded in the stock Solr Dockerfile
mkdir -p ./conf/docker-compose/solr-bind/
sudo chown 8983:8983 ./conf/docker-compose/solr-bind/
Copy link
Member

@pdurbin pdurbin Feb 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sudo necessary? Can we do the chown without it?

It wigs me out when scripts ask for my sudo password.


# copy sourcecode and installer files over
cp pom.xml modules/container-dataverse/src/main/docker/
cp -R ./src/ modules/container-dataverse/src/main/docker/src/
cp -R ./modules/dataverse-parent/ modules/container-dataverse/src/main/docker/modules/dataverse-parent/
cp -R ./scripts/ modules/container-dataverse/src/main/docker/scripts/
cp -R ./conf/ modules/container-dataverse/src/main/docker/conf/
cp -R ./local_lib/ modules/container-dataverse/src/main/docker/local_lib/

# build out each of the images
mvn -Pct -f modules/container-base clean install -Dmaven.test.skip -Ddocker.verbose=true
mvn -Pct -f modules/container-postgresql clean install -Dmaven.test.skip -Ddocker.verbose=true
mvn -Pct -f modules/container-rserve clean install -Dmaven.test.skip -Ddocker.verbose=true
mvn -Pct -f modules/container-seaweedfs clean install -Dmaven.test.skip -Ddocker.verbose=true
mvn -Pct -f modules/container-solr clean install -Dmaven.test.skip -Ddocker.verbose=true
mvn -Pct -f modules/container-traefik clean install -Dmaven.test.skip -Ddocker.verbose=true
mvn -Pct -f modules/container-dataverse clean install -Dmaven.test.skip -Ddocker.verbose=true
4 changes: 4 additions & 0 deletions conf/docker-compose/dataverse/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[default]
region = us-east-1
s3 =
signature_version = s3v4
3 changes: 3 additions & 0 deletions conf/docker-compose/dataverse/credentials
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[default]
aws_access_key_id = secret
aws_secret_access_key = secret
83 changes: 83 additions & 0 deletions conf/docker-compose/dataverse/startup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/bash

export dpkgArch="$(dpkg --print-architecture)"
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}"

# create the config file that has all our environment settings

echo -e "
[glassfish]
HOST_DNS_ADDRESS=${HOST_DNS_ADDRESS}
GLASSFISH_USER = ${GLASSFISH_USER}
GLASSFISH_DIRECTORY = /usr/local/payara5/
GLASSFISH_ADMIN_USER = ${GLASSFISH_ADMIN_USER}
GLASSFISH_ADMIN_PASSWORD = ${GLASSFISH_ADMIN_PASSWORD}
GLASSFISH_HEAP = 2048
GLASSFISH_REQUEST_TIMEOUT = 1800

[database]
POSTGRES_ADMIN_PASSWORD=${POSTGRES_ADMIN_PASSWORD}
POSTGRES_SERVER=${POSTGRES_SERVER}
POSTGRES_PORT=${POSTGRES_PORT}
POSTGRES_DATABASE=${POSTGRES_DATABASE}
POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
POSTGRES_USER=${POSTGRES_USER}

[system]
ADMIN_EMAIL=${ADMIN_EMAIL}
MAIL_SERVER=${MAIL_SERVER}
SOLR_LOCATION=${SOLR_LOCATION}

[rserve]
RSERVE_HOST=${RSERVE_HOST}
RSERVE_PORT=${RSERVE_PORT}
RSERVE_USER=${RSERVE_USER}
RSERVE_PASSWORD=${RSERVE_PASSWORD}

[doi]
DOI_USERNAME = dataciteuser
DOI_PASSWORD = datacitepassword
DOI_BASEURL = https://mds.test.datacite.org
DOI_DATACITERESTAPIURL = https://api.test.datacite.org
" > /dataverse/scripts/installer/default.config

# https://github.com/poikilotherm/dataverse/blob/ct-mvn-mod/modules/container-base/src/main/docker/Dockerfile
# https://guides.dataverse.org/en/latest/installation/config.html#amazon-s3-storage-or-compatible
# set s3 storage settings
if ! grep -q "Ddataverse.files.s3.type=s3" "/usr/local/payara5/glassfish/domains/domain1/config/domain.xml"; then
# use : as delimiter
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.label=s3</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.access-key=${S3_ACCESS_KEY}</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.secret-key=${S3_SECRET_KEY}</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.custom-endpoint-url=http\:\/\/seaweedfs\:8333</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
# keep this as dataverse as it's hardcoded elsewhere
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.custom-endpoint-region=us-east-1</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
# # Use path style buckets instead of subdomains
sed -i "s:</java-config>:<jvm-options>-Ddataverse.files.s3.path-style-access=true</jvm-options>\n</java-config>:" /usr/local/payara5/glassfish/domains/domain1/config/domain.xml
fi

cd /dataverse/scripts/installer/

# the installer needs to run from within the directory, it cannot be run from / for example
# this can take some time to run, be patient
python3 install.py --noninteractive --force

# check if we should disable DOI validation
if [[ ! -z "${DISABLE_DOI}" ]] && [[ "true" = "${DISABLE_DOI}" ]]; then
echo "Disabling DOI validation"
curl -X PUT -d FAKE http://localhost:8080/api/admin/settings/:DoiProvider
fi

# check if we should exclude emails from exports
if [[ ! -z "${EXCLUDE_EMAIL_EXPORTS}" ]] && [[ "true" = "${EXCLUDE_EMAIL_EXPORTS}" ]]; then
echo "Excluding emails in exports"
curl -X PUT -d true http://localhost:8080/api/admin/settings/:ExcludeEmailFromExport
fi

# create an empty s3 bucket in seaweedfs if it doesn't already exist
curl -X POST "http://seaweedfs:8888/buckets/"
curl -X POST "http://seaweedfs:8888/buckets/dataverse/"

wait-for-it localhost:8080 -- tail -f /usr/local/payara5/glassfish/domains/domain1/logs/server.log
6 changes: 6 additions & 0 deletions conf/docker-compose/postgres/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# https://hub.docker.com/_/postgres
FROM postgres:14

RUN apt-get update && \
apt-get -y upgrade && \
rm -rf /var/lib/apt/lists/*
Loading