Staging Hub deployment for Pangeo #599

choldgraf · 2021-08-10T19:36:14Z

Description

We should deploy a staging hub for Pangeo that has the same infrastructure setup on less-costly infrastructure. This may also generate some other tasks that we need to accomplish in order to get the base infrastructure running.

Benefit

This will help us iterate more quickly and get feedback from the Pangeo team. It will also be a place where we can stage changes in the future without affecting prod, since Pangeo is a more complex and dynamic setup than most of our community hubs.

Tasks to complete

Create initial staging deployment
- Figure out NFS configuration: Staging Hub deployment for Pangeo #599 (comment)
Merge in Add config and deploy Pangeo staging hub #597
Use Google Filestore for home directories #651
Iterate with feedback from @rabernat
- Choices for session types: Staging Hub deployment for Pangeo #599 (comment)
Confirm that major functionality works as expected

Updates

2021-08-24: We've got a hub deployed but the user servers aren't being created properly due to some NFS errors. We've agreed to try fixing this for two weeks (@yuvipanda will give this a shot). If we cannot fix it after that time, we will:
- Re-deploy the Pangeo hub using Google File Storage
- Track the deployment of in-cluster NFS as a separate enhancement here: Run NFS servers in-cluster #50
2021-08-31: in the sprint planning meeting today, we discussed that, now that NFS is ready to go (Run NFS servers in-cluster #50) we should be ready to review this PR and merge it in, and then ask Pangeo folks to take a look at the hub and make sure it looks good. In a future step, we will finish up Authenticate users with GitHub Teams membership in Pangeo Hub #598 and deploy it, but that's not necessary for the initial deployment

The text was updated successfully, but these errors were encountered:

sgibson91 · 2021-08-11T14:22:21Z

I've deployed a hub... sort of. k8s isn't able to mount the NFS server and I'm not sure if it's because I missed a step or because of the private cluster #597 (comment)

choldgraf · 2021-08-17T17:11:11Z

I believe this is no longer blocked. Now we need to have a team discussion about whether the NFS strategy used in the PR is the right strategy to use in general. I've updated this issue to mark it as-such. Check out @sgibson91's main question here:

#597 (comment)

sgibson91 · 2021-08-23T15:09:40Z

A staging hub exists https://staging.pangeo.2i2c.cloud/

But spawning of the user server fails which means the NFS still needs some tweaking. Not sure if that needs to happen in #597 or #613

choldgraf · 2021-08-23T18:58:31Z

congrats @sgibson91 :-) 🚀

could we define a hand-off plan for this issue while you're away? I tried updating the top comment so it's clear what the next steps are...what's the information that could make it easiest for somebody else to finish up the NFS stuff?

sgibson91 · 2021-08-24T14:16:40Z

The first thing that needs to be done is fixing the spawn failure #597 (comment)

There's some discussion going on here about behaviour, but I think that needs a decision before it can be implemented #597 (comment)

We should also figure out if that work needs to happen in #597 or #613. If it can go in #597, then I think #613 could be merged. Or maybe at this point it's just better to open up a new PR and start afresh anyway.

choldgraf · 2021-08-31T19:55:33Z

update: in the sprint planning meeting today, we discussed that, now that NFS is ready to go (#50) we should be ready to review this PR and merge it in, and then ask Pangeo folks to take a look at the hub and make sure it looks good.

In a future step, we will finish up #598 and deploy it, but that's not necessary for the initial deployment

yuvipanda · 2021-09-01T21:54:12Z

This actually fails (it didn't used to!)

2021-09-01T21:38:01Z [Warning] MountVolume.SetUp failed for volume "pvc-958d82c1-5383-45e1-85e8-011091b2ae0f" : mount failed: exit status 1 Mounting command: /home/kubernetes/containerized_mounter/mounter Mounting arguments: mount -t nfs -o noatime,soft,vers=4.2 10.12.4.188:/export/pvc-958d82c1-5383-45e1-85e8-011091b2ae0f /var/lib/kubelet/pods/faeae25f-2156-4322-b6d4-cd00c438821b/volumes/kubernetes.io~nfs/pvc-958d82c1-5383-45e1-85e8-011091b2ae0f Output: Mount failed: mount failed: exit status 32 Mounting command: chroot Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs -o noatime,soft,vers=4.2 10.12.4.188:/export/pvc-958d82c1-5383-45e1-85e8-011091b2ae0f /var/lib/kubelet/pods/faeae25f-2156-4322-b6d4-cd00c438821b/volumes/kubernetes.io~nfs/pvc-958d82c1-5383-45e1-85e8-011091b2ae0f] Output: mount.nfs: mounting 10.12.4.188:/export/pvc-958d82c1-5383-45e1-85e8-011091b2ae0f failed, reason given by server: No such file or directory

Trying to deploy-support fails with:

Error: UPGRADE FAILED: cannot patch "support-nfs-server-provisioner" with kind StatefulSet: StatefulSet.apps "support-nfs-server-provisioner" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

I think our timebox for using google file store expired, so i'm going to abandon in-cluster NFS and go that way.

Ran into issues with in-cluster NFS, so 2i2c-org#599 (comment) Fixes 2i2c-org#599

rabernat · 2021-09-02T01:16:18Z

Thanks so much for all the hard work here!

After Yuvi's ping on Slack, I just tried logging in. I clicked login and got redirected to authorize a new github app (iam-login-something). Once redirected back to https://staging.pangeo.2i2c.cloud/hub/oauth_callback?code=..., I got met with

403 : Forbidden

If your email address has NOT been added to the list of allowed users for this hub, please contact the hub administrators.

Our previous cluster was configured to allow all users from the group https://github.com/orgs/pangeo-data/teams/us-central1-b-gcp to be able to log in. It would be great to use that same group here.

Let me know how I can help.

choldgraf · 2021-09-02T01:17:20Z

@rabernat just a note that we are tracking the GitHub teams auth here: #598

choldgraf · 2021-09-02T01:23:04Z

I think we need to add @rabernat here

https://github.com/2i2c-org/pilot-hubs/blob/ffceb3d397bdd76a3ae9b9fc8ecfd1811da71ef4/config/hubs/pangeo-hubs.cluster.yaml#L70

And then he can add other admins etc just until we get the GitHub teams auth working

rabernat · 2021-09-02T01:24:40Z

Ah ok, thanks for clarifying. No worries.

yuvipanda · 2021-09-02T01:43:47Z

@rabernat try now

rabernat · 2021-09-02T12:05:07Z

Ok so I will continue to post feedback on this issue, as suggested by Chris.

Item 1: There are no choices of machine type on startup. Compare this to the Profile List on https://us-central1-b.gcp.pangeo.io/. This is important because some users (like my class) just need a small machine while others (like researchers) need lots of memory.

rabernat · 2021-09-02T12:09:23Z

Item 2: My home directory is not there. It would be great if we could migrate over the home directories from the old cluster. Since both clusters are using GC Filestore, perhaps this is trivial: just mount the same volume on the new cluster. But since it lives in a different project, maybe that doesn't work.

rabernat · 2021-09-02T12:23:19Z

Item 3: Hub is not configured for requester-pays access to cloud data.

I discovered this by running the first few cells of this notebook, specifically

from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml")
ds  = cat["sea_surface_height"].to_dask()

raises

OSError: Forbidden: https://storage.googleapis.com/download/storage/v1/b/pangeo-cmems-duacs/o/.zmetadata?alt=media
Caller does not have serviceusage.services.use access to the Google Cloud project.

choldgraf · 2021-09-02T18:33:16Z

I'll try to capture some of @rabernat's suggestions in subsequent issues so that we don't lose track of them.

Note that when I try to log-in I'm running into a "scale-up" error:

(I selected the smallest machine type)

choldgraf · 2021-09-02T19:03:01Z

Another note - if I go to Services -> Dask Gateway (https://staging.pangeo.2i2c.cloud/services/dask-gateway/) then I get a blank page with 404 Not Found.

yuvipanda · 2021-09-02T19:20:29Z

@choldgraf ah, the second smallest one works for me. Let's isolate and tweak the sizes until they all work. Can we use #652 to track and close this?

choldgraf · 2021-09-02T20:29:01Z

@yuvipanda sounds good - I think that once #651 is merged we can consider this one closed (actually it should close automatically), and can then focus on specific improvements to the staging hub in separate issues

Ref 2i2c-org#599

choldgraf mentioned this issue Aug 10, 2021

New Hub: Pangeo JupyterHub (GCP) #482

Closed

9 tasks

choldgraf assigned sgibson91 Aug 10, 2021

This was referenced Aug 11, 2021

Support multiple backends for SOPS #575

Closed

Add config and deploy Pangeo staging hub #597

Merged

choldgraf added blocked labels Aug 16, 2021

choldgraf added need discussion and removed blocked labels Aug 17, 2021

choldgraf added the blocked label Aug 21, 2021

choldgraf changed the title ~~Initial staging Hub deployment for Pangeo~~ Staging Hub deployment for Pangeo Aug 23, 2021

choldgraf removed the blocked label Aug 23, 2021

choldgraf assigned yuvipanda Aug 24, 2021

choldgraf added the impact: high label Aug 26, 2021

choldgraf assigned yuvipanda and unassigned yuvipanda and sgibson91 Aug 31, 2021

choldgraf removed the need discussion label Sep 1, 2021

yuvipanda closed this as completed in #597 Sep 1, 2021

yuvipanda reopened this Sep 1, 2021

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Sep 1, 2021

Use Google Filestore for home directories

4d3b1e6

Ran into issues with in-cluster NFS, so 2i2c-org#599 (comment) Fixes 2i2c-org#599

yuvipanda mentioned this issue Sep 1, 2021

Use Google Filestore for home directories #651

Merged

rabernat mentioned this issue Sep 2, 2021

Prepare Pangeo Hub for Ryan's course #652

Closed

8 tasks

yuvipanda closed this as completed in #651 Sep 3, 2021

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Sep 3, 2021

Deploy pangeo hubs from CI

b64660c

Ref 2i2c-org#599

yuvipanda mentioned this issue Sep 3, 2021

Deploy pangeo hubs from CI #657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Staging Hub deployment for Pangeo #599

Staging Hub deployment for Pangeo #599

choldgraf commented Aug 10, 2021 •

edited

Loading

sgibson91 commented Aug 11, 2021

choldgraf commented Aug 17, 2021

sgibson91 commented Aug 23, 2021

choldgraf commented Aug 23, 2021 •

edited

Loading

sgibson91 commented Aug 24, 2021 •

edited

Loading

choldgraf commented Aug 31, 2021

yuvipanda commented Sep 1, 2021

rabernat commented Sep 2, 2021

choldgraf commented Sep 2, 2021

choldgraf commented Sep 2, 2021

rabernat commented Sep 2, 2021

yuvipanda commented Sep 2, 2021

rabernat commented Sep 2, 2021

rabernat commented Sep 2, 2021

rabernat commented Sep 2, 2021

choldgraf commented Sep 2, 2021 •

edited

Loading

choldgraf commented Sep 2, 2021

yuvipanda commented Sep 2, 2021

choldgraf commented Sep 2, 2021 •

edited

Loading

Staging Hub deployment for Pangeo #599

Staging Hub deployment for Pangeo #599

Comments

choldgraf commented Aug 10, 2021 • edited Loading

Description

Benefit

Tasks to complete

Updates

sgibson91 commented Aug 11, 2021

choldgraf commented Aug 17, 2021

sgibson91 commented Aug 23, 2021

choldgraf commented Aug 23, 2021 • edited Loading

sgibson91 commented Aug 24, 2021 • edited Loading

choldgraf commented Aug 31, 2021

yuvipanda commented Sep 1, 2021

rabernat commented Sep 2, 2021

choldgraf commented Sep 2, 2021

choldgraf commented Sep 2, 2021

rabernat commented Sep 2, 2021

yuvipanda commented Sep 2, 2021

rabernat commented Sep 2, 2021

rabernat commented Sep 2, 2021

rabernat commented Sep 2, 2021

choldgraf commented Sep 2, 2021 • edited Loading

choldgraf commented Sep 2, 2021

yuvipanda commented Sep 2, 2021

choldgraf commented Sep 2, 2021 • edited Loading

choldgraf commented Aug 10, 2021 •

edited

Loading

choldgraf commented Aug 23, 2021 •

edited

Loading

sgibson91 commented Aug 24, 2021 •

edited

Loading

choldgraf commented Sep 2, 2021 •

edited

Loading

choldgraf commented Sep 2, 2021 •

edited

Loading