Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker-compose build system #8709

Closed
wants to merge 29 commits into from
Closed

Conversation

carlsonp
Copy link
Contributor

@carlsonp carlsonp commented May 17, 2022

What this PR does / why we need it:

There have been a wide variety of community efforts involved around deploying Dataverse in containers such as Docker and Kubernetes. I am by no means a Docker expert but this separation of code into services for usage by docker-compose made sense in my mind. There are obviously many different ways to prepare it. I also wanted a way to be able to make changes and compile the Java code via the container itself rather than relying on prebuilt downloads to deploy. This does the following:

  • Builds a copy of the .war deployable code from source
  • Stands up various services and pieces needed:
    • seaweedfs - for s3 storage
    • traefik - reverse proxy, HTTP is re-routed automatically to HTTPS
    • postgres - database backend
    • solr - text indexing database
    • rserve - R server for running R commands
    • dataverse - the main Dataverse web application
  • sets up two storage options, one is the default <id>=files for local storage
    and the other is <id>=s3for s3 storage

I wholeheartedly would appreciate suggestions and improvements from others who have much more experience than I do around Dataverse and container technologies. I do feel the lack of an "officially" supported container option in the dataverse repo is making it harder for new developers to jump on-board and contribute. I hope that we can come up with a solution that targets both local developer needs as well as people wanting to run Dataverse in development or production settings on a server. Thank you for your time.

Which issue(s) this PR closes:

Unknown if there are backlog issues related to this.

Special notes for your reviewer:

There are additional items that we may wish to discuss or consider to include as part of this PR:

  • look at post configuration steps and hardening, is there anything else here we want to bring in?
  • skip install.py if already installed, the dataverse container takes a long time to startup, is there a better method here? look at /conf/docker-compose/dataverse/startup.sh and see if there are improvements to be made
  • persist log files from dataverse?
  • test auth providers, I have no experience with these
  • look into docker-compose secrets instead of environment variables
  • add and test Windows build script like we have with prepbuild.sh
  • add Github action for building, dependabot, etc.
  • move docker-compose.yml outside of this sub-directory to git repo root (done)
  • check recorded user IP address, does the reverse proxy need to be adjusted?
  • setup Grafana and System Metrics for seaweedfs: https://github.com/chrislusf/seaweedfs/wiki/System-Metrics
  • change default storage provider to s3 from files? remove files as a provider?

Suggestions on how to test this:

Details on building are in /conf/docker-compose/README.md. All commands will be run from that directory.

Things I tested but on a different branch so we likely need additional testers here:

  • Emails, I was able to hook it into our SMTP server
  • The Site URL and FQDN settings
  • S3 storage
  • Persisting docroot files such as changing the root dataverse logo
  • Persisting data from Solr and Postgres between restarts

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No UI changes, this is all backend build system.

Is there a release notes update needed for this change?:

Likely yes.

Additional documentation:

None

@beepsoft
Copy link
Contributor

I tried with all your defaults and it seems to work except rserve, which fails with this:

rserve       | Error in Rserve::run.Rserve(remote = TRUE, auth = TRUE, pwdfile = "/rserve.pwd",  : 
rserve       |   ignoring SIGPIPE signal
rserve       | Execution halted

I also had to comment out the

COPY --chown=dataverse:dataverse ./.m2/ /home/dataverse/.m2/

line, but you already fixed that as I see. When building on an M1 Mac I also had to change JAVA_HOME to

ENV JAVA_HOME /usr/lib/jvm/java-1.11.0-openjdk-arm64

@carlsonp
Copy link
Contributor Author

@beepsoft Thank you. I refactored the code a fair bit. I was able to build and run this on my Raspberry Pi to test both arm64 and amd64 architectures. Are you able to try again to see if this resolves the issues you faced?

I also move the docker-compose.yml and the Dataverse specific Dockerfile to the root of the repo. This means there is only one step for the prepbuild.sh to do and makes it much easier for developers to make changes in the real codebase and test stuff as compared to making an entire folder copy of all of dataverse.

@beepsoft
Copy link
Contributor

beepsoft commented May 21, 2022

On M1 arm it now fails for me with this:

Progress (3): 0.8/2.2 MB | 0.4/1.2 MB | 69/250 kB 
#25 674.9 [output clipped, log limit 1MiB reached]
#25 ERROR: executor failed running [/bin/sh -c cd /dataverse/ &&   export dpkgArch="$(dpkg --print-architecture)" &&   export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" &&   mvn package -DskipTests]: exit code: 1
------
 > [21/25] RUN cd /dataverse/ &&   export dpkgArch="$(dpkg --print-architecture)" &&   export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" &&   mvn package -DskipTests:
------
executor failed running [/bin/sh -c cd /dataverse/ &&   export dpkgArch="$(dpkg --print-architecture)" &&   export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" &&   mvn package -DskipTests]: exit code: 1
ERROR: Service 'dataverse' failed to build : Build failed

I checked that JAVA_HOME is calculated correctly.

I also tried running mvn package -DskipTests locally and builds all right.

I also tried running just mvn dependency:resolve instead of mvn package -DskipTests in the Dockerfile and this fails as well.

@carlsonp
Copy link
Contributor Author

Hmm interesting, thanks. I'll see if I can get a colleague to try it on his M1.

@carlsonp
Copy link
Contributor Author

I believe I've fixed the Maven build system. If others could test this again that would be great. Thanks

@beepsoft
Copy link
Contributor

I can confirm it builds and runs all right on M1 now.

@scolapasta
Copy link
Contributor

Hi @carlsonp, for this next sprint we are catching up on Community PRs. Would you mind updating/ refreshing thiis PR from develop? Thanks!

@carlsonp
Copy link
Contributor Author

carlsonp commented Aug 4, 2022

I've rebased onto develop.

@poikilotherm
Copy link
Contributor

poikilotherm commented Aug 25, 2022

Just to let everybody reading in the future about this pull request know: @carlsonp @pdurbin and me talked today about this and related work like #8832, #8834 or #8320.

We're all good, we don't want to block each other, we seem to agree about long term goals (which is what I am up for) and we want to coordinate and iterate.

After all, all the work done here and elsewhere isn't as far away from each other as one might assume. And we are all on the same page these kind of things must go upstream to be F.A.I.R. (haha!) and easy to use.

@pdurbin pdurbin self-assigned this Aug 29, 2022
@carlsonp
Copy link
Contributor Author

carlsonp commented Aug 31, 2022

TODO for me: split out counter-processor library into it's own service
Have a /makedatacount dir that is a Docker Volume or K8s volume to share those logs between Dataverse and the COUNTER processor

@carlsonp
Copy link
Contributor Author

TODO for me: add Trivy container scanning

@mreekie mreekie added the bk2211 label Nov 1, 2022
@pdurbin pdurbin removed their assignment Dec 20, 2022
@carlsonp carlsonp marked this pull request as draft January 3, 2023 15:39
@mreekie mreekie removed the bk2211 label Jan 11, 2023
@coveralls
Copy link

coveralls commented Jan 12, 2023

Coverage Status

Coverage: 20.013% (+0.02%) from 19.997% when pulling 134695d on carlsonp:docker-compose into f63f0e8 on IQSS:develop.

@carlsonp
Copy link
Contributor Author

Started to work using the existing base container as the base for the dataverse build. Moved the other containers into the similar maven build system in the modules folder. Stuck on getting dataverse to launch, will need help with Payara pre and post boot scripts.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running the build containers script.

# prep Solr beforehand so it has the appropriate permissions
# 8983 is the UID hard-coded in the stock Solr Dockerfile
mkdir -p ./conf/docker-compose/solr-bind/
sudo chown 8983:8983 ./conf/docker-compose/solr-bind/
Copy link
Member

@pdurbin pdurbin Feb 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sudo necessary? Can we do the chown without it?

It wigs me out when scripts ask for my sudo password.


RUN mkdir -p /home/dataverse/.aws/
COPY --chown=dataverse:dataverse ./conf/docker-compose/dataverse/config /home/dataverse/.aws/config
COPY --chown=dataverse:dataverse ./conf/docker-compose/dataverse/credentials /home/dataverse/.aws/credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you want to make the credentials MicroProfile config options, so they're not stored in plain text? this can be done in 5.10+

@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2023

I checked with ❤️@carlsonp❤️ and he's fine with us closing this in favor of this PR:

Here's the quickstart for devs from the PR, by the way: https://dataverse-guide--9439.org.readthedocs.build/en/9439/container/dev-usage.html

@carlsonp thanks for helping with the containerization effort! 🎉 🚀

@pdurbin pdurbin closed this Mar 16, 2023
@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2023

@carlsonp I just wanted to let you know that ideas in this PR are still being discussed: #9439 (comment) ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants