Add a section on how to use helm stable/postgresql for the hub #575

gsemet · 2018-03-12T22:38:39Z

@yuvipanda tell me if a documentation patch like this is ok for you.

Reference #567

yuvipanda

Thanks for the PR, @gsemet! I appreciate the doc addition, although I think it should take a significantly different form. I've left inline comments, but my overall opinion is:

We shouldn't recommend people run complex RDBMs on Kubernetes at this time,
Even when the time comes where running complex RDBMs on Kubernetes works ok, we should still link people to instructions than provide them ourselves. This reduces support burden on this projects, prevents us from accidentally introducing security holes, and makes it less likely we are providing users with out-of-date information. I think @choldgraf has thoughts on this too, around 'when to put content here vs when to link to content'.

What I would like to have docs for is:

How to connect to an external postgres database for the hub db? Your PR has this info, and I think this will be very useful.
When to use an external database? The only reasons I think are:
2a. You want faster hub restarts
2b. You don't have a fast enough PVC provider
2c. You already have a postgresql / mysql database you want to keep using.
How to determine when the db is corrupted / dropped, and what to do when this happens.

I appreciate your PRs and work in the jupyterhub/kubernetes community. Do you think we can steer this PR into one of the suggested doc are as above? Hopefully, this feedback is useful and not overly negative.

yuvipanda · 2018-03-12T23:32:44Z

doc/source/advanced.md

+- ``mysql``
+- ``postgres``
+
+Using PostgreSql or MySQL provides the main advantage to allow seemless update


Can you explain a bit more what you mean by this paragraph? What differences are you seeing between sqlite / postgres? When the hub pod restarts, here are the following sequence of actions:

Running hub pod pod dies

New hub pod is started and scheduled

(if using sqlite on a PVC) If new pod is on a different node than old pod (which we've taken steps to avoid by using pod affinity rules), disk must detach from old node and attach to new node

hub process starts, and checks all user servers to see if they are running (this takes the longest time)

It resumes service

Using an external database will make step (3) go faster, but (4) is the biggest cause for concern.

with sqlite, "Running hub pod pod dies" yes. With postgres, i clearly see it waits for the new pod to be up before being terminated.

ok

ok (i guess the termination of the old hub occurs here)

ok (i guess there is also a potential DB scheme migration can occurs here, so the service might be unavailable here if this steps is too long)

ok

i'll check tomorow more carefully, but from my experiment and reading the hub deployment, we see there is a rolling update strategy applied.

But for sure we clearly see a better "user experience" when using postgresql, even without pvc (so restart of the pod mean loose of data) than the sqlite backend

Indeed, a rollingupdate strategy is applied when using postgres! I've reverted that in #576 - I thought I already had, but had forgotten! Apologies for the confusion!

yuvipanda · 2018-03-12T23:35:36Z

doc/source/advanced.md

+SQLite and MySQL/PostgreSQL are for new user, in order to allow them to log in
+even when a new hub is starting).
+
+The easiest way of running PostgreSQL for the Hub is to use Helm


On more thought, I don't think we should be encouraging users to run their postgresql databases on Kubernetes. Doing so is hard (https://medium.com/kokster/postgresql-on-kubernetes-the-right-way-part-one-d174ee8a56e3 and elsewhere on the web), and when the postgresql pod dies your hub process won't be accessible - even though the hub itself might be up. This gives us two points of failure than the current one, and for not much benefit - there'll always only be one hub process talking to a database.

I think the 'easiest' way to use an external database is to use a hosted cloud provided database (like RDS / Cloud SQL), or to set one up on an external machine. Running databases on kubernetes right now is something I'll consider 'super advanced', and should probably also use something like https://github.com/CrunchyData/postgres-operator rather than the helm chart.

yes, storing critical data on kubernetes is hard. The intend of this proposal is not to mislead people on letting them believe a simple solution exist with this helm.

This is an option admins can have, and for POC, for test or even for real usage at small scale, a postgreSQL on a PVC is great, makes the hub happy. At worst, in case of database crash, we only loose current connection. Still better than the sqlite backend, even if it also works fine.

The idea is to give people option. I can add a link to the document, or other, that would highlight HA is hard to acheive with PG as any other database, and that the best option is to take the DB service provided by our cloud provider.

I work on baremetal, so we do not have any PG as a service yet (should come at one point). But we intensively use PG on kubernetes, of course only on non critical information (ie, where restarting from an empty DB is not a big deal)

yuvipanda · 2018-03-12T23:36:01Z

doc/source/advanced.md

+  lost when the PostgreSQL server will restart. Do not use `--recreate-pods`
+  helm option. It will not cause any data loss for user, just the proxy will
+  not be able to route logged users. 
+- even without persistance, your hub can still be upgraded smoothly for the


This relies on the postgres pod never dying, which is not true.

sure, but it's up to the admin to choose persistance or not. Pretty easy to do with the helm config

yuvipanda · 2018-03-12T23:36:46Z

doc/source/advanced.md

+  users, because a Hub restart will not cause the PostgreSQL pod to restart.
+- The Hub will use Rolling Update, so a new hub will be started while the 
+  other is still running and the transition will be transparent for the user.
+- if case of DB loss or reset, simply ensure all your server are shutdown


I think a separate section on 'how to recover from database corruption' might be useful outside of this.

i can remove this line, recovering from db corruption is a big problem on its own. It was just here to tell "no big deal, there is no data loss, only restart everything and it will work again"

yuvipanda · 2018-03-12T23:37:19Z

doc/source/advanced.md

+  not be able to route logged users. 
+- even without persistance, your hub can still be upgraded smoothly for the
+  users, because a Hub restart will not cause the PostgreSQL pod to restart.
+- The Hub will use Rolling Update, so a new hub will be started while the 


This is not true I think, and might actually cause corruption because at times two hub processes might be talking to the same database. The hub is not designed to work like that.

I am refering to this section of the configuration. It does keep the old hub running and accepting new people while the new hub instance is initializing. Maybe there is a short outage but at least it does not take very long.

Ah, good catch! @minrk pointed out the problems that could cause in #367 (review) but I forgot to make a revert. I've opened #576 now to revert that change properly.

yuvipanda

I think #576 should help with the confusion a bit - I thought we had done that PR a long time ago, clearly I had forgotten! Hopefully my comments make a little bit more sense in that context!

yuvipanda · 2018-03-13T00:45:09Z

doc/source/advanced.md

+It is advised to use the Database as a Service solution that your Cloud
+provider may provide you.
+
+You have the option to run PostgreSQL for the Hub using Helm 'stable/postgresql'


Can we instead provide content like:

You can run postgresql either via your cloud provider's hosted offering (linkes to Amazon RDS, Google Cloud SQL, Azure's Database), or by setting it up yourself manually. There's experimental method of running it directly on kubernetes (link to helm chart), but we do not recommend it.

I'd prefer we have that rather than provide direct instructions on using the helm chart here.

yuvipanda · 2018-03-13T00:45:46Z

doc/source/advanced.md

+- ``mysql``
+- ``postgres``
+
+Using PostgreSql or MySQL provides the main advantage to allow seemless update


Indeed, a rollingupdate strategy is applied when using postgres! I've reverted that in #576 - I thought I already had, but had forgotten! Apologies for the confusion!

yuvipanda · 2018-03-13T00:46:47Z

doc/source/advanced.md

+
+Note the following points when using PostgreSQL as database for the Hub:
+
+- If you do not use a persistant volume on PostgreSQL, your database will be


This only applies if the postgres database is set up with helm, which we should recommend against. Possibly remove this whole section?

i would not choose for the user, I prefer giving him the information, the limitation of each solution, so he can make his own choices depending on his own situation.

We are dealing with people that are deploying JH in a kubernetes cluster, this population should know what they are doing. It is always better to give them hints and links to more informaiton, and I learn a lot when reading zero2jupyterhub documentation.

I guess you would like to prevent people from making mistakes, that's honorable, but if i would write a scrum story for this feature I would formulate it such as: "as a devops that would like to deploy jupyterhub in my kubernetes cluster, I would like to be able to use a postgresql server started on my cluster, when I do not have an DBaaS option".

Also, it may be an argument from authority, but there is so much helm charts in the 'stable' section of the official repository, that already provide an optional postgresql dependency, they should not be all wrong, is it? It is always optional, if the admin has a better option if should not enable this dependency and use its DBaaS.

gsemet · 2018-03-13T00:48:41Z

i am appear a bit pushy on this one but switching to pg it added a better user experience on our baremetal jupyterhub installation I would like to document for other to try.
There are so much helm charts that provides as optional dependency. it is useful for testing, but yes, we need to ensure admin knows what they do and choose this option carefully

Reference jupyterhub#567 Signed-off-by: Gaetan Semet <gaetan@xeberon.net>

choldgraf · 2018-03-13T12:00:55Z

This is a lot of great conversation, thanks @gsemet and @yuvipanda ! Just a quick note that is worth considering: I think that @gsemet 's comments make sense if we can assume that our jupyterhub k8s deployers are relatively experienced ops people, but I don't think that we can assume this.

In addition to more experienced people, Z2JH is also aimed towards relative kubernetes newbies with little experience in this kind of thing (e.g. a teacher that wants to use a jupyterhub for a 2 day workshop). We need to be careful not to overload these people with information, because many will simply start following whatever steps are there and dig themselves into a hole...

yuvipanda · 2018-03-15T17:58:38Z

@choldgraf Regardless of deployer experience, I think the core problem is that the biggest advantage here (rolling updates) is actually a bug in z2jh, and will be 'fixed' when #576 gets merged. There are still great reasons to use postgres, but only in specific circumstances. I still do think that running postgres on kubernetes is a bad idea and not one we should recommend for anyone though.

yuvipanda · 2018-03-15T17:59:52Z

@gsemet that's great to hear it's been smoother for you! Can you give me a little more details on how your set up is, and what problems you ran into with the default sqlite setup? Thanks!

gsemet · 2018-03-15T20:22:58Z

Hello. So, as a background, I did have some experience on sqlite and mysql, i worked with Buildbot to maintain a CI for 2000 android developer. But I do not consider myself as a DB expert at all. So I did a log of sqlite/sqlalchemy/alembic and it made us a lot of hard time maintaining it.

So, that is too tell you I can maybe to help you on this part. And it is always good for my own personnal skills to work on these hard subject (DB, concurence, migration) with more experienced people.

I have set up a jupyterhub using the marvelous z2jh guide. I really appreciated the fact it takes people by the hand for the deployment, so it helped me a lot learning more about kube/helm stuff. We now use daily at work kubernetes for all our microservices but in a very self-cooking way. So bad for maintainance.

Now jupyterhub is running on our baremetal kubernetes cluster, behind a Traefik Ingress without any problem. First I started with the sqlite on a PVC, it worked indeed well, but there is no "smooth upgrade", ie a rolling update. New user or admin going on /hub/home found a 502 error during hub deployment (that can take a few minutes if the select node does not have the docker image yet).
I cannot use the prepuller, do to some internal restriction on our kube cluster, or I did not took the time to make it run (and our devops team does not like when we try to mount the docker socket in a container...

So I switched to PostgreSQL, using the helm chart because it is simpler. Also we do not have yet a DBaaS, so we cannot ask for a safe postgres database with high availability and so on. It will be available at one point, and I understand the point you mention admin should always use the DBaaS provided by our cloud provider, like Google, AWS or OpenStack. But that is simply not an option yet.

And our JupyterHub cluster is not intended for high availability, just should work at best. We had a discontinuity today of the hub (the node died). The proxy also died so user has to wait for it to come back. This is not a problem for us, but I guess high availability for the proxy is more (or switch to traefik proxy? :) )

After switching to postgres, we have a rolling update for the hub so new user can still log in. I did not experienced any "corruption" like mentionned here, but I think our environment does not stress it enough. I think there is a small discontinuity (like I described before) but it is so short I cannot say how long it takes (I should measure it).

I would like to propose an value.yaml option to make this behavior back but under an opt-in trigger (False by default, and users that want this rolling update can still enable it on their own risk). I am for proposing user (=people that knows or learn how to deploy in kubernetes) the best "good" choices by default, while still letting user to enable some "experimental" feature on their own.

Power users can still fork the project and maintain their own modification, but I really feel the charts and helm are exactly made for feature gating (let people made their own choice).

I agree and disagree with you, @choldgraf. maybe we can move all these information in a "very experimental features page" or a wiki/blog section, but their is not "too much information". Their are relevant information, and useless noise. After, it is indeed only a matter of organisation, and properly choosing the default choices (Convention Over Configuration, but at the end it a willing user want to derive from the convention, he still can... at an higher cost of maintainability, but it is his responsibility).

yuvipanda · 2018-03-15T20:39:03Z

Thanks for the detailed information, @gsemet! I highly appreciate it!

I agree in your case using the posgres helm chart was a good option!

I've two primary thoughts:

Should we allow users to do things that we know will eventually cause problems (such as rolling restart of the hub, see Revert "Set maxUnavailable appropriately for hub & proxy pods" #576 for discussion)?
What are the levels of 'support' that we (as the JupyterHub team) want to provide for various things in z2jh?

I think my answer to question 1 is:

Yes, but in very broad terms that allow overriding various kubernetes bits. For example, in the case of #576, we might allow users to override the entire 'updateStragey' field. This requires that whoever wants to play with rolling updates do so with full knowledge of kubernetes norms, and can play with it with full power without having to rely on us to support various features. I'm very pro allowing overrides of all parts of the Kubernetes objects we create.

Question 2 is really the core of this issue, I think. As JupyterHub maintainers, we have limited time and resources to support people who might run into problems. We want to make sure we only provide info we can support in some form. This limits what we can have here, but also allows us time and space to build things, support the community best we can, and keep the guide up to date and tested whenever we make releases without burning the community out. However, the negative consequence of this is that we do not have a space for 'these things worked for people in specific circumstances, and you can try them too!'. This would have the clear expectation that it's not 'supported' in the same way the core z2jh guide is, and relies on much wider community of practice to keep up to date than us. A recent example of this is running z2jh on OpenStack. Lots of people are trying to do this and it'd be great to have a space for them to collaborate - but most of the info is related to running k8s on OpenStack, and z2jh itself is not a good place for it. Any OpenStack info we have in the guide is going to be incomplete and out of date very soon, since none of us are OpenStack experts. I think the situation is similar for postgres.

This community space can also be a space to try out different approaches to doing something before moving it into z2jh itself.

We currently don't have any guidelines of these 'levels of support' and what not, so writing this out is a very good idea.

My preference would be for us to start another repo that acts as a wiki, with very liberal editing rights (pretty much anyone can write anything there) and a specific scope (to be discussed). This will allow space for advanced (and specific) use cases to be documented and organically adopted, without increasing the support burden on us.

What do you think, @gsemet?

yuvipanda · 2018-03-15T20:43:14Z

I also filed #584 for the 'hub is down for a while when changing config' issue, which should help with at least some of the problems!

gsemet · 2018-03-15T22:59:50Z

I'm very pro allowing overrides of all parts of the Kubernetes objects we create.

Regarding this part, is their an "helm" way of doing it? because even with the --set we are still stuck with the values exposed in the value.yaml file. if a part is hardcoded in the template or even not declared at all, we cannot "patch" easily this section easilly, can we?

Community section is perfect for me! later some part could then be promoted to official jupyter support team. Wiki is fine. i do not see, sadly, this section in the sphinx doc, even if i am more confortable that in wiki edition. maybe you do not want to review these changes and let the "community" organize itself.
But I would be really happy with a small space somewhere, where i can share my own setup/tips, without having the jupyter team to add additional support afterwards

consideRatio · 2020-09-30T16:23:58Z

Hi @gsemet and @yuvipanda. Thank yo @gsemet for your PR and sorry you experienced it become stale. I want to reach a conclusion for this PR so I've spent some time to consider it with fresh eyes. I rebased this PR and had a look.

I concluded that I'd like to close this PR for these reasons:

We have docs on how to use an external database in the configuration reference
It becomes quite hard to maintain documentation if we go in detail on how to install other Helm charts like stable/postgresql, so I'd like us at most to reference it as a way to install an external postgresql server.

I'm going ahead and closing this issue without awaiting a response since the PR has been stale for a long time, but please feel free to discuss this further.

gsemet force-pushed the postgresql branch 3 times, most recently from 0a301de to 9f96e38 Compare March 12, 2018 22:46

yuvipanda requested changes Mar 12, 2018

View reviewed changes

gsemet force-pushed the postgresql branch from 9f96e38 to 9df7751 Compare March 13, 2018 00:38

yuvipanda requested changes Mar 13, 2018

View reviewed changes

Add a section on how to use helm stable/postgresql for the hub

65ce00f

Reference jupyterhub#567 Signed-off-by: Gaetan Semet <gaetan@xeberon.net>

gsemet force-pushed the postgresql branch from 9df7751 to 65ce00f Compare March 13, 2018 01:12

yuvipanda mentioned this pull request Mar 15, 2018

Restart jupyterhub process instead of pod when mounted configmap changes #584

Closed

yuvipanda mentioned this pull request Mar 18, 2018

Creating a user wiki jupyterhub/mybinder.org-user-guide#83

Closed

yuvipanda mentioned this pull request Aug 29, 2018

Discussion about keycloak integration? #886

Closed

consideRatio closed this Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a section on how to use helm stable/postgresql for the hub #575

Add a section on how to use helm stable/postgresql for the hub #575

gsemet commented Mar 12, 2018

yuvipanda left a comment

yuvipanda Mar 12, 2018

gsemet Mar 13, 2018 •

edited

Loading

yuvipanda Mar 13, 2018

yuvipanda Mar 12, 2018

gsemet Mar 13, 2018 •

edited

Loading

yuvipanda Mar 12, 2018

gsemet Mar 13, 2018

yuvipanda Mar 12, 2018

gsemet Mar 13, 2018

yuvipanda Mar 12, 2018

gsemet Mar 13, 2018

yuvipanda Mar 13, 2018

yuvipanda left a comment

yuvipanda Mar 13, 2018

yuvipanda Mar 13, 2018

yuvipanda Mar 13, 2018

gsemet Mar 13, 2018 •

edited

Loading

gsemet commented Mar 13, 2018

choldgraf commented Mar 13, 2018

yuvipanda commented Mar 15, 2018

yuvipanda commented Mar 15, 2018

gsemet commented Mar 15, 2018 •

edited

Loading

yuvipanda commented Mar 15, 2018

yuvipanda commented Mar 15, 2018

gsemet commented Mar 15, 2018 •

edited

Loading

consideRatio commented Sep 30, 2020


		Note the following points when using PostgreSQL as database for the Hub:

		- If you do not use a persistant volume on PostgreSQL, your database will be

Add a section on how to use helm stable/postgresql for the hub #575

Add a section on how to use helm stable/postgresql for the hub #575

Conversation

gsemet commented Mar 12, 2018

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsemet Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsemet Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsemet Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

gsemet commented Mar 13, 2018

choldgraf commented Mar 13, 2018

yuvipanda commented Mar 15, 2018

yuvipanda commented Mar 15, 2018

gsemet commented Mar 15, 2018 • edited Loading

yuvipanda commented Mar 15, 2018

yuvipanda commented Mar 15, 2018

gsemet commented Mar 15, 2018 • edited Loading

consideRatio commented Sep 30, 2020

gsemet Mar 13, 2018 •

edited

Loading

gsemet Mar 13, 2018 •

edited

Loading

gsemet Mar 13, 2018 •

edited

Loading

gsemet commented Mar 15, 2018 •

edited

Loading

gsemet commented Mar 15, 2018 •

edited

Loading