Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker for production #4665

Closed
omaralsoudanii opened this issue May 13, 2018 · 75 comments
Closed

Docker for production #4665

omaralsoudanii opened this issue May 13, 2018 · 75 comments
Assignees

Comments

@omaralsoudanii
Copy link

Hello ,
Is there official production ready docker container for data verse ? and if available where can i find the docs?
Thank you

@pdurbin
Copy link
Member

pdurbin commented May 14, 2018

@omaralsoudanii thanks for opening this issue! It's been nice talking to you over at nds-org/ndslabs-dataverse#8 but this main issue tracker for Dataverse is a better place to discuss the state of Docker support. I can try to explain.

The short answer is that unfortunately, there is no production-ready Docker image for Dataverse as of this writing. I'm still learning Docker and its ecosystem so I'd like to know from you and others who are interested in Docker support how you expect it to work. Do you expect all of Dataverse to be running in a single container? Do you expect some of the components such as PostgreSQL and Solr to be running in different containers? Do you expect to use docker-compose? Do you expect to run Docker images in an orchestration platform such as Kubernetes?

If you look at the source tree, this is what you will find today:

I hope this helps. If you and others interested in Docker support can answer the questions above or other questions I don't even know to ask, it would be very helpful! Thanks!

@omaralsoudanii
Copy link
Author

omaralsoudanii commented May 14, 2018

@pdurbin Thanks for this ,
It would be nice to have 3 containers :
1- Data verse repo container
2- Postgres container
3- Solr container

And a docker-composer.yml file that will build 3 images / 3 containers for those automatically ,
Also it would be good to include an entry point to manage update scripts to date verse and Postgres sql changes , See https://docs.docker.com/engine/reference/builder/#entrypoint .

Because the problem we're having now is in order to update and maintain data verse we need to adjust DockerFile manually (change data verse instance version) and do the update process manually (run each Postgress schema changes manually ) , So some automated script that takes cares of those would be nice .

Another suggestion would be that the data verse team adjusts the DockerFile configuration each time an update to data verse is released so that we can simply pull docker changes from data verse repo and do :

docker-compose down

docker-compose up --build

For our current version (4.7) though we're trying to update to the latest version (4.8.6) in order to use the license and restrict files native API but i need a fresh data verse Postgres database , Is there a script or sql file for that?

Thank you.

@pameyer
Copy link
Contributor

pameyer commented May 14, 2018

@omaralsoudanii Any thoughts on an additional container for apache? My understanding is that it's preferable to not expose glassfish/dataverse directly to the outside world, so it might make sense to deal with it at this level of abstraction (but I'm not currently planning on using dockerized dataverse in production, so input from folks that are would be interesting).

@vsoch
Copy link

vsoch commented May 25, 2018

hey everyone! I think I can definitely help with this, and I'd like to suggest starting with a simple docker-compose setup, and then using kompose to transition to kubernetes when the time is right! I'll start poking around this weekend (I'm not familiar with the code base) and then we can loop back next week (when the RedHat intern joins up?) How would you like to have our mentoring / discussion? Github? Slack? Other?

@pdurbin
Copy link
Member

pdurbin commented May 25, 2018

@vsoch thanks for the offer to help with this effort! I haven't even met the intern yet but @danmcp mentioned internship goals the other day at http://irclog.iq.harvard.edu/dataverse/2018-05-22#i_67516 . I believe @djbrooke said this individual might be starting as soon as next week and I said I'd be happy to mentor him or her on all things related to Dataverse. I'm happy to help you and others dive in as well, of course!

Perhaps a timeline of events (from my perspective anyway) would help ground us:

I'm probably forgetting people and events but it's a start. In a future comment I'd like to write more about how I can imagine the work for this issue being broken up into smaller chunks.

@pameyer
Copy link
Contributor

pameyer commented May 25, 2018

@vsoch The idea of starting with docker-compose sounds like a reasonable place to start. There's an example of docker-compose in this repository (although intended for a different purpose) in conf/docker-dcm.

One thing that may be a factor for production-izing Docker/Dataverse (and that @pdurbin and I have talked about) is how to handle the intersection of the docker entry point and the dataverse installer. It's very likely that the approach I've been taking as been sub-optimal for production usage (and so wouldn't make a good prototype for production).

@4tikhonov
Copy link
Contributor

Probably our docker-compose from DataverseEU project can be interesting to try as well: https://github.com/Dans-labs/dataverse-docker/blob/master/docker-compose.yml

@vsoch
Copy link

vsoch commented May 25, 2018

@pameyer agreed I would not have the command coincide with the entire install, so the container can be restarted without prompting it done again!

@4tikhonov the work you've done for DataverseEU looks great! It's 99% there! Is it just a matter of combining the two repos into a consolidated thing with a good set of docs? With your blessings I can give a first shot at this.

@4tikhonov
Copy link
Contributor

4tikhonov commented May 25, 2018

Thanks! We're running it with Kubernetes on Google Cloud already http://dataverse-dev.cessda.eu
Some documentation for Docker module is available here: https://github.com/Dans-labs/dataverse-docker

@vsoch
Copy link

vsoch commented May 25, 2018

So what needs to be done then?

@4tikhonov
Copy link
Contributor

4tikhonov commented May 25, 2018

We're still working on the proxy implementation for multilingual support to get different languages on selected path (/fr for French, for example). First we tried with apache but Google Cloud already has nginx as integrated service running inside of Kubernetes, it seems to be better solution.

@pameyer
Copy link
Contributor

pameyer commented May 25, 2018

@4tikhonov How did you end up handling the entrypoint / installer issue (run installer in entrypoint, something else)?

@4tikhonov
Copy link
Contributor

@pameyer, I remember that you advised me to do health check before running installer second time:
size=$(curl -sI http://localhost:8080/api/info/version | grep Content-Length|awk '{print $2}')

It's in entrypoint indeed, if size > 0 then Dataverse is already installed and further installation steps can be skipped.

@pdurbin
Copy link
Member

pdurbin commented May 25, 2018

@4tikhonov I know it's quittin' time for you you but if you're up for a wild Friday night in Dataverse IRC we're chatting now: http://irclog.iq.harvard.edu/dataverse/2018-05-25#i_67861 😄

@vsoch
Copy link

vsoch commented May 25, 2018 via email

@vsoch
Copy link

vsoch commented May 26, 2018

hey everyone I just tested https://github.com/Dans-labs/dataverse-docker and it is a solid start, could someone again tell me why this isn't what you want?

@pdurbin
Copy link
Member

pdurbin commented May 26, 2018

@vsoch hi! Thanks for taking a look. Personally, I've been more focused on the OpenShift use case and documented what we've been up to at http://guides.dataverse.org/en/4.8.6/developers/containers.html

I'm sorry to say that I haven't looked closely at the @Dans-labs work by @4tikhonov but I believe @pameyer has so he might be in a better position than I am to comment.

Ideally we'll have a solution that works on:

  • docker-compose
  • Kubernetes
  • OpenShift
  • something that isn't even on my radar 😄

@vsoch
Copy link

vsoch commented May 26, 2018

I think it would be worth having someone look at his good solution before asking someone else to start over :) My comments would be small tweaks to the containers (e.g., adding env DEBIAN_FRONTEND noninteractive and maybe putting the deps inside the containers to begin with, but I don't think it's a good use of time to start from scratch when this project is going that (I think) is in the right direction.

@pdurbin
Copy link
Member

pdurbin commented May 27, 2018

@vsoch that makes total sense. I did take a quick look just now.

@omaralsoudanii since you opened this issue, can you please try https://github.com/Dans-labs/dataverse-docker and provide feedback on if it's going to work for you?

@4tikhonov are they any specific bugs or issues with that repo before anyone should use it in production?

@aculich do you want to give that repo a try as well?

@vsoch
Copy link

vsoch commented May 27, 2018

@4tikhonov I think you've done a great start and I can offer to help, but only if you need/want it.

@4tikhonov
Copy link
Contributor

@vsoch, thanks, we're very indeed interested to get feedback. Our goal is to get it finally integrated with master branch at some point.

@pdurbin
Copy link
Member

pdurbin commented May 29, 2018

@4tikhonov is there anything holding you back from making a pull request?

@omaralsoudanii
Copy link
Author

@pdurbin @4tikhonov
this seems to be working great so far on my local machine ,

One question i have is what's the steps required if say a new dataverse version is up in the repo , Will this docker image auto update the version when using docker-compose build ( without deleting the existing data) or any further steps required ?

I will provide more feedback after testing it with the native API .

Thank you

@pdurbin
Copy link
Member

pdurbin commented Jun 4, 2018

@4tikhonov any thoughts on the question by @omaralsoudanii above?

There's a section of the Installation Guide I wrote called "Choose Your Own Installation Adventure" which is inspired by a series of books I enjoyed as a child. Under "Advanced Installation" I mention community-supported adventures such as https://github.com/IQSS/dataverse-ansible by @donsizemore and it would make sense to me to have the Docker/Kubernetes installation adventure by @4tikhonov fall into the community-supported category as well. That is, @4tikhonov and others in the community could publish Docker images as new releases of Dataverse come out. I'd be happy to give @4tikhonov and others "push" access to a new repo under the @IQSS GitHub organization if that makes sense. Please just let me know!

@4tikhonov
Copy link
Contributor

@omaralsoudanii, it should work for other Dataverse versions as well as latest version should be downloaded to Docker container from GitHub.

@pdurbin, I think it's great idea and I would like to participate in this Docker/Kubernetes installation adventure.

@pdurbin
Copy link
Member

pdurbin commented Jun 4, 2018

@4tikhonov great! What would you like the repo under @IQSS to be called?

  • dataverse-docker?
  • dataverse-kubernetes?
  • dataverse-containers?

I'll set up a team and give you push access. And whoever else you want.

@4tikhonov
Copy link
Contributor

@pdurbin dataverse-docker is fine, thanks, Phil!

@pdurbin
Copy link
Member

pdurbin commented Jun 4, 2018

@4tikhonov sure! I just created https://github.com/IQSS/dataverse-docker and added you to both of these teams:

The repo is empty right now but you should be able to push code there and add more collaborators. Please let me know if you have any trouble. Thanks!

@vsoch
Copy link

vsoch commented Feb 28, 2019

Holy cow @pdurbin you just won the award for the highest density of references and links I've ever seen in a single comment, anywhere! 🎊 Thank you! This looks like an awesome resource and I'll check it out!

@pdurbin
Copy link
Member

pdurbin commented Apr 25, 2019

@vsoch unfortuntely at #4152 (comment) we learned that the instance of NDS Labs Workbench we've been talking about won't be supported any more. The code remains open source if someone else wants to run it.

Meanwhile, some excellent progress was made by @poikilotherm yesterday toward deploying his community-supported dataverse-kubernetes effort on AWS. Even though in gdcc/dataverse-kubernetes#12 I was asking for EKS support (also discussed at http://irclog.iq.harvard.edu/dataverse/2019-04-24#i_91578 ), all I think I really wanted was a way to deploy his solution on AWS somehow. In pull request gdcc/dataverse-kubernetes#45 he has created what looks like fantastic documentation on how to run dataverse-kubernetes on AWS using kops. I haven't tried it myself but from what I hear at http://irclog.iq.harvard.edu/dataverse/2019-04-25#i_91713 it sounds awesome. All you need is an AWS account, which we also document at http://guides.dataverse.org/en/4.13/developers/deployment.html . I haven't tried it myself but I'd like to encourage anyone reading this to try out the readme in that pull request and report back here any feedback. From my perspective, this unblocks this issue, potentially at least. I don't believe anyone is using dataverse-kubernetes in production yet, but @poikilotherm plans to.

Additionally, @4tikhonov recently tweeted at https://twitter.com/4tykhonov/status/1116229640232873984 some slides regarding his dev process which makes use of https://github.com/IQSS/dataverse-docker and a continous integration pipeline. You can see some more discussion about this (and screenshots) at #5725 (comment) and #5751 (comment) . I don't believe anyone is using dataverse-docker in production yet. I think it's mostly used for (fantastic) demos. I'd be interested in knowing how easy it is to deploy dataverse-docker to AWS since that's the cloud resource that's available to me.

@omaralsoudanii what is your status, please? Have you been experimenting with running Dataverse in Docker?

@xibriz I recently learned that @poikilotherm is also using GitLab CI. You two might want to coordinate.

@xibriz
Copy link
Contributor

xibriz commented Apr 25, 2019

@pdurbin Since you have an incredible overview of everything, I have to let you know that I will leave my job at UiT in about a month.

As far as I know, no one will be taking over the Kubernetes-trials at UiT :/

@pdurbin
Copy link
Member

pdurbin commented Apr 25, 2019

@xibriz thanks for the update! Good luck on your future adventures! I'd love to see you in Cambridge again some day! 😄

@vsoch
Copy link

vsoch commented Apr 25, 2019

@pdurbin thanks for letting me know - it's an incredible (and costly) effort to provide this kind of resource and I understand not being able to support it forever. It's really fantastic to see all the great work on Dataverse! Heads up @poikilotherm the link at the top of your repo here is 404.

I still have yet to figure out how to easily develop with an actual cluster (without the immense burden of costs) but I'm learning a lot of GoLang and working on supported tools to hopefully learn a bit regardless. Definitely ping me if I can be of any help, other than offering words of encouragement! :)

@poikilotherm
Copy link
Contributor

@vsoch seems like the link I used was accidentially a link you can only follow when accessing logged in... Thx for pointing that out, will change it.

@vsoch
Copy link

vsoch commented Apr 25, 2019

I'm not worthy!!!

Just kidding :)

@pdurbin
Copy link
Member

pdurbin commented May 13, 2019

@omaralsoudanii are you still interested in this? There are some community efforts to run Dataverse on Docker in production but it is not supported by IQSS.

@pdurbin
Copy link
Member

pdurbin commented Feb 28, 2020

The first installation of Dataverse to advertise itself as running on Kubernetes in production was just added to the map at https://dataverse.org/installations

FZJ-on-Dataverse

@poikilotherm is my hero! 🎉 He recently gave a talk at http://talks.bertuch.name/dataverse-k8s-20200124/ about running Dataverse on Kubernetes:

Screen Shot 2020-02-28 at 6 12 25 AM

Awesome.

@portante
Copy link

The first installation of Dataverse to advertise itself as running on Kubernetes in production was just added to the map at https://dataverse.org/installations

Great news!

@4tikhonov
Copy link
Contributor

Great, congratulations, @poikilotherm! I'm also convinced that Kubernetes is the only way to go, all services should follow the same direction to increase the maturity, not only Dataverse.

@pdurbin
Copy link
Member

pdurbin commented Feb 28, 2020

@portante thanks for the shout out at https://twitter.com/pportante/status/1233367941346885633 !

When should we circle back to #4040 ? 😄

@pdurbin
Copy link
Member

pdurbin commented Feb 28, 2020

@4tikhonov I absolutely agree!

At @pidapalooza 2020 someone told me, "For anything new, it has to run on Kubernetes."

Hmm, come to think of it... what's the definition of done for this issue? 😄

@poikilotherm
Copy link
Contributor

@pdurbin asked me to link to gdcc/dataverse-kubernetes#129 here because of the first bullet in #4040 being about running things on OpenShift. @portante is this still relevant for you?
Phil is really eager to chat with you on IRC and I'd be happy to share thoughts and ideas.

@4tikhonov
Copy link
Contributor

@4tikhonov I absolutely agree!

At @pidapalooza 2020 someone told me, "For anything new, it has to run on Kubernetes."

Wait a little bit and I'm pretty sure we can bring really big fish to the Dataverse community.
It's also providing previewers and other services running on Kubernetes, as a part of this POC:
https://twitter.com/4tykhonov/status/1232276978021126144

@poikilotherm
Copy link
Contributor

@pdurbin vote to close. With https://github.com/gdcc/dataverse-kubernetes and https://github.com/gdcc/dataverse/tree/develop+ct we should have sufficient things in place. Dunno if https://github.com/IQSS/dataverse-docker is made for production.

@djbrooke
Copy link
Contributor

Thx @poikilotherm and others for all of the work on various solutions in the container space. I'll close this as we already have Docker and Kubernetes on https://guides.dataverse.org/en/latest/developers/containers.html and we can add more community solutions as they reach maturity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants