Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Glassfish Statefulsets to OpenShift/Kubernetes #4617

Closed
MichaelClifford opened this issue Apr 25, 2018 · 25 comments
Closed

Add Glassfish Statefulsets to OpenShift/Kubernetes #4617

MichaelClifford opened this issue Apr 25, 2018 · 25 comments

Comments

@MichaelClifford
Copy link
Contributor

As students in Boston University's EC528 Cloud Computing Course, my team has been working with @pdurbin @danmcp & @DirectXMan12 to further work on #4040 & #4168.

I have been working on scaling the Glassfish pods and am ready to open a pull request. My solution entails creating a statefulset and providing updates to the Glassfish installation script to differentiate between the initial master pod, dataverse-glassfish-0, and any subsequently deployed pods.

@pdurbin
Copy link
Member

pdurbin commented Apr 25, 2018

@MichaelClifford thanks for opening this issue and working on this project! You're editingopenshift.json, right? I'm asking because pull request #4599 by @patrickdillon has edits to that file and I'm close to merging that pull request. I don't mean to complicate your life but you might want to branch from his branch before you make your pull request. That way, we can test both changes together.

@MichaelClifford
Copy link
Contributor Author

MichaelClifford commented Apr 25, 2018 via email

@pdurbin
Copy link
Member

pdurbin commented Apr 25, 2018

@MichaelClifford right, it's the final checks that I don't want you to waste your time on if I'm going to ask you to retest with the combined solution. Does that make sense? I'm also leaving comments for you in #4598 just to make this extra hard. 😉

@pdurbin
Copy link
Member

pdurbin commented Apr 25, 2018

@MichaelClifford ok I just merged #4599 so please makes sure your (future) pull request has those changes in it before you spend a lot of time testing. @patrickdillon and I think his changes and your changes to openshift.json should be compatible. Please let me know if you have questions. Thanks!

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2018

@MichaelClifford thanks for making pull request #4626! I requested a few minor changes and I'm going to ask @landreev to take a look at changes you made to glassfish-setup.sh.

If you have any questions, please let me know!

@pdurbin pdurbin self-assigned this Apr 27, 2018
@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2018

@MichaelClifford how much memory do you recommend for minishift? I'm been using 4GB like this:

minishift start --vm-driver=virtualbox --memory=4GB

But as of your latest fixes (ae115fd) I'm getting "insufficient memory" errors:

screen shot 2018-04-27 at 3 10 23 pm

@MichaelClifford
Copy link
Contributor Author

@pdurbin I would recommend changing line 156 of the openshift.json file "resources": {"limits": {"memory": "3072Mi"} to "resources": {"limits": {"memory": "512Mi"} when deploying locally.

This seems to work on minishift, but when deployed to the MOC we needed additional resources, about 3Gb, to successful run the glassfish pods. Let me know if that helps.

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2018

@MichaelClifford Whoops! I wrote 8GB and 4GB above but I'm really using 4GB, I just edited my comment above to reflect reality.

Does that change anything? It seems a bit counterintuitive to give it less memory (512Mi instead of 3072Mi), but I'll give it a shot!

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2018

Ok, I tried a few things. I switched to 8GB like this:

minishift start --vm-driver=virtualbox --memory=8GB

I'm not messing with memory in openshift.json. The only changes I made locally are to switch to my Docker Hub account ("pdurbin") rather than "iqss". I won't be pushing this change, of course:

murphy:dataverse pdurbin$ git diff conf/openshift/openshift.json
diff --git a/conf/openshift/openshift.json b/conf/openshift/openshift.json
index b49f7fe..684b1d1 100644
--- a/conf/openshift/openshift.json
+++ b/conf/openshift/openshift.json
@@ -100,7 +100,7 @@
         "name": "dataverse-plus-glassfish"
       },
       "spec": {
-        "dockerImageRepository": "iqss/dataverse-glassfish"
+        "dockerImageRepository": "pdurbin/dataverse-glassfish"
       }
     },
     {
@@ -120,7 +120,7 @@
         "name": "iqss-dataverse-solr"
       },
       "spec": {
-        "dockerImageRepository": "iqss/dataverse-solr"
+        "dockerImageRepository": "pdurbin/dataverse-solr"
       }
     },
     {
@@ -146,7 +146,7 @@
             "containers": [
               {
                 "name": "dataverse-plus-glassfish",
-                "image": "iqss/dataverse-glassfish:latest",
+                "image": "pdurbin/dataverse-glassfish:latest",
                 "ports": [
                   {
                     "containerPort": 8080,
murphy:dataverse pdurbin$ 

The good news is that oc new-app conf/openshift/openshift.json "just worked" in the sense that I got a working Dataverse installation, was able to log in, create a dataverse, see that it was indexed into Solr, etc. Then I thought I'd try a similar trick as at #4598 (comment) to scale the number of replicas to 3:

oc scale statefulset/dataverse-glassfish --replicas=3

For some reason, I'm getting "The pod has been stuck in the pending state for more than five minutes." Here are screenshots:

Before (1 replica)

screen shot 2018-04-27 at 4 05 05 pm

After (immediately)

screen shot 2018-04-27 at 4 08 12 pm

After (five minutes later)

screen shot 2018-04-27 at 4 16 39 pm

screen shot 2018-04-27 at 4 10 50 pm

I'm not really sure what's going on. Dataverse is still up, running off dataverse-glassfish-0, but I'm not sure how to scale up to additional replicas.

Any thoughts for me @MichaelClifford @danmcp @DirectXMan12 @patrickdillon @rockash? I tried looking at the log for dataverse-glassfish-1 but this pod seems not well:

screen shot 2018-04-27 at 4 19 57 pm

I'm on commit ae115fd

@danmcp
Copy link
Contributor

danmcp commented Apr 27, 2018

Anything going on in events?

@MichaelClifford
Copy link
Contributor Author

@pdurbin, Not sure if you tired this yet or not, but did you try using my glassfish image jcliffbu/glassboat:test2 or did you rebuild the entire application and push it to your docker hub?

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2018

@danmcp shoot, I forgot to check and I'm no longer at my desk. I'll take a look next week.

@MichaelClifford I rebuilt the entire application and pushed it to my docker hub. There's a new option I added in e7a56c7 so I can run ./build.sh huborg pdurbin or whatever to push to a non-iqss Docker Hub account.

@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

@danmcp @MichaelClifford ah, for dataverse-glassfish-1 the "Events" page says "Failed Scheduling... 0/1 nodes are available: 1 Insufficient memory. 450 times in the last 2 days".

Here are some screenshots:

screen shot 2018-04-30 at 9 22 25 am
screen shot 2018-04-30 at 9 22 45 am

Again, I'm using 8 GB like this:

minishift start --vm-driver=virtualbox --memory=8GB

@MichaelClifford you were mentioning before that I should lower the amount of memory from 3072Mi to 512Mi. I'll include the "diff" for openshift.json below, which also includes my changes to getting the image I built and pushed to my "pdurbin" account on Docker Hub. Is lowering the amount of memory what you want me to try try next? It seems counterintuitive to me.

Please advise. Thanks. I'm still on ae115fd and and standup today I'll let @landreev know about your changes to the Dataverse installer.

murphy:dataverse pdurbin$ git diff conf/openshift/openshift.json 
diff --git a/conf/openshift/openshift.json b/conf/openshift/openshift.json
index b49f7fe..0eabbe7 100644
--- a/conf/openshift/openshift.json
+++ b/conf/openshift/openshift.json
@@ -100,7 +100,7 @@
         "name": "dataverse-plus-glassfish"
       },
       "spec": {
-        "dockerImageRepository": "iqss/dataverse-glassfish"
+        "dockerImageRepository": "pdurbin/dataverse-glassfish"
       }
     },
     {
@@ -120,7 +120,7 @@
         "name": "iqss-dataverse-solr"
       },
       "spec": {
-        "dockerImageRepository": "iqss/dataverse-solr"
+        "dockerImageRepository": "pdurbin/dataverse-solr"
       }
     },
     {
@@ -146,14 +146,14 @@
             "containers": [
               {
                 "name": "dataverse-plus-glassfish",
-                "image": "iqss/dataverse-glassfish:latest",
+                "image": "pdurbin/dataverse-glassfish:latest",
                 "ports": [
                   {
                     "containerPort": 8080,
                     "protocol": "TCP"
                   }
                 ],
-                "resources": {"limits": {"memory": "3072Mi"
+                "resources": {"limits": {"memory": "512Mi"
                   }
                   
                   },
murphy:dataverse pdurbin$ 

@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

I'm still on ae115fd and and standup today I'll let @landreev know about your changes to the Dataverse installer.

I just ran vagrant up on the commit above, which is a way to test the installer outside the context of OpenShift (it run in a CentOS 7 VM running in VirtualBox). The installer worked fine and I got a running version of Dataverse on http://localhost:8888 as expected and as documented at http://guides.dataverse.org/en/4.8.6/developers/tools.html#vagrant

@patrickdillon
Copy link
Contributor

The change you made to openshift.json to lower the memory looks good. I would try that. The 3GB limit is intended for running on the MOC. I could deploy the 512MB version on my VM which has 4GB.

@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

@patrickdillon ok, 512 MB for Glassfish seems like a very small amount of memory I'll try what you're suggesting, including 4GB or the VM. Thanks. Please stay tuned.

@MichaelClifford
Copy link
Contributor Author

@pdurbin, I ran a similar version on my local minishift this morning with 1024MB as the glassfish resource limit and that seemed to work.

@MichaelClifford
Copy link
Contributor Author

@pdurbin if you try to deploy 3 3GB pods on a VM that only has 8Gb of memory I think that will necessarily run into memory issues, right? As @patrickdillon mentioned, the 3GB limit was needed when deploying on the MOC. Let me know if you continue to have memory issues after updating the memory resources.

@patrickdillon
Copy link
Contributor

@MichaelClifford even when replicas is set to 1, I still get issues with the 3GB limit.

@MichaelClifford
Copy link
Contributor Author

@patrickdillon, you need to account for the total memory on your VM, right? If GlassFish is 3GB, PostgreSql is 1 GB and Solr is 1 GB, that will exceed the memory allocated to your entire Minishift VM. And since, Glassfish can only successfully deploy after Solr and PostgreSql. I think it makes sense that it would not deploy correctly without updating the memory allocation.

@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

I just tried a 4GB VM and 512 MB Glassfish and the default config of 1 replica I just got the dreaded "Remote server does not listen for requests on [localhost:4848]. Is the server up?" error, which probably means that Glassfish wasn't able to start because it didn't have enough memory. Here's the log:

screen shot 2018-04-30 at 11 33 42 am

What's you're saying makes sense that my VM needs to have enough memory to all the replicas. Maybe I'll switch to a 8 GB VM for further testing. And perhaps 1024 MB for Glassfish. I'm pretty much ok with merging the pull request as-is, especially since vagrant up worked, but it might be nice to explain in the dev guide what to expect with the current config of 1 replica (worked fine in my initial test: #4617 (comment) ) and how setting will need to be adjusted if you want to play with replicas=3 or whatever. I'll keep playing with it. Thanks for the chatter.

@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

@MichaelClifford @patrickdillon et al. success! I'm going to push this change to openshift.json unless there are any objections:

murphy:dataverse pdurbin$ git diff conf/openshift/openshift.json
diff --git a/conf/openshift/openshift.json b/conf/openshift/openshift.json
index b49f7fe..4f43f5c 100644
--- a/conf/openshift/openshift.json
+++ b/conf/openshift/openshift.json
@@ -153,7 +153,7 @@
                     "protocol": "TCP"
                   }
                 ],
-                "resources": {"limits": {"memory": "3072Mi"
+                "resources": {"limits": {"memory": "1024Mi"
                   }
                   
                   },
murphy:dataverse pdurbin$ 

I'm also changing the part in the doc where I recommend a 4 GB Minishift VM. I'm going to make it 8 GB because I'm having much more success with three Glassfish servers.

I'm going to document some of the stuff I'm doing to test in the dev guide as well as some info on StatefulSets in general. This will help @landreev and others code review this pull request as well, I think.

pdurbin added a commit to EC528-Dataverse-Scaling/dataverse that referenced this issue Apr 30, 2018
@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

@MichaelClifford @patrickdillon I just pushed 1324e39 to the pull request and made the change above to Glassfish memory and documented some stuff about replicas. Please take a look and let me know if you spot a typo or anything weird. It's kind of a brain dump. Thanks!

@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

I have some screenshots I took while writing up my thoughts in 1324e39 and will put them below.

First, here's the out of the box config with a single replica for Glassfish:

screen shot 2018-04-30 at 11 56 49 am

Then you run the following command to add two more Glassfish servers for a total of three:

oc scale statefulset/dataverse-glassfish --replicas=3

screen shot 2018-04-30 at 11 57 37 am

Here's how a dataverse logo looks when it can be found on the filesystem:

screen shot 2018-04-30 at 1 23 35 pm

Here's a broken image of a dataverese logo that can't be found because you are on a different Glassfish server than the one you uploaded it on:

screen shot 2018-04-30 at 1 22 08 pm

As discussed in 1324e39 and IQSS/dataverse-aws#10 and http://irclog.iq.harvard.edu/dataverse/2016-10-21 and https://help.hmdc.harvard.edu/Ticket/Display.html?id=240995 I believe the current work around is to put the "logos" directory on a shared file system. On a single host the directory the default directory is /usr/local/glassfish4/glassfish/domains/domain1/docroot/logos.

pdurbin added a commit that referenced this issue Apr 30, 2018
Adding Glassfish Statefulsets for OpenShift Deployment #4617
@pdurbin
Copy link
Member

pdurbin commented Apr 30, 2018

@landreev and I looked over pull request #4626 and I just merged it. Thanks, all!

For more details, I highly recommend watching the 23 minute video at https://github.com/BU-NU-CLOUD-SP18/Dataverse-Scaling#our-project-video

Closing. For more on OpenShift, keep and eye on #4040.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants