Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Hub: Pangeo JupyterHub (GCP) #482

Closed
9 tasks done
Tracked by #6
sgibson91 opened this issue Jun 23, 2021 · 52 comments
Closed
9 tasks done
Tracked by #6

New Hub: Pangeo JupyterHub (GCP) #482

sgibson91 opened this issue Jun 23, 2021 · 52 comments
Assignees

Comments

@sgibson91
Copy link
Member

sgibson91 commented Jun 23, 2021

Background

The Pangeo Project currently runs a number of JupyterHubs and BinderHubs for their community, but do not have dedicated resources to operate and maintain this infrastructure. We have plans to migrate these hubs to 2i2c infrastructure so that we can provide this service for them. This issue will track the migration of the JupyterHub currently hosted on GCP.

Current hub config location: https://github.com/pangeo-data/pangeo-cloud-federation

Sources of info:

Setup Information

Important Information

Deploy To Do

admin

Set up cluster

Set up hubs

@rabernat
Copy link
Contributor

Ideally we can have the current name https://us-central1-b.gcp.pangeo.io/ alias / redirect to the new hub.

@sgibson91
Copy link
Member Author

Progress Update

#489 looks good to merge to deploy a new cluster. Unfortunately, we're blocked by an organisational constraint that prevents external IP addresses from being assigned. This will need intervention by @rabernat

Error: Error waiting for creating GKE cluster: Not all instances running in IGM after 40.240072593s. Expected 1, running 0, transitioning 1. Current errors: [CONDITION_NOT_MET]: Instance 'gke-pangeo-hubs-cluster-default-pool-65fa3508-485z' creation failed: Constraint constraints/compute.vmExternalIpAccess violated for project 291560455175. Add instance projects/pangeo-integration-te-3eea/zones/us-central1-b/instances/gke-pangeo-hubs-cluster-default-pool-65fa3508-485z to the constraint to use external IP with it.

In the meantime, I will test a full deploy on the pangeo-181919 project so at least we can be ready to deploy once the above is mitigated.

@yuvipanda yuvipanda changed the title [Hub] - Pangeo JupyterHub (GCP) New Hub: Pangeo JupyterHub (GCP) Jul 13, 2021
@yuvipanda
Copy link
Member

From talking to @rabernat at the pangeo cloud meeting, we should make sure this has a support deployment (#456), with https://github.com/jupyterhub/grafana-dashboards setup. https://github.com/yuvipanda/python-popularity-contest should also be setup.

@sgibson91
Copy link
Member Author

Progress update

We now have a cluster deployed into the new project! 🎉 I will begin working on deploying a staging hub

@sgibson91
Copy link
Member Author

@rabernat @TomAugspurger I'd love to understand Pangeo's authentication system a little more for working on a hub config.

Authentication

Like the AWS deployment, we use auth0 to authenticate with the hubs after they fill out the form: https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/.github/workflows/UpdateMembers.yml, https://github.com/pangeo-data/pangeo-cloud-federation/actions?query=workflow%3AUpdateMembers.

(From pangeo-data/pangeo-cloud-federation#874)

2i2c hubs also use Auth0 but, as far as I understand, we still connect to a GitHub handle or Google Account. I'd love to know more about what's happening under the hood so I can translate the workflow.

@choldgraf
Copy link
Member

@sgibson91 I've added a to-do item for that one since it seems like a non-trivial thing to keep track of it their policy deviates from what we're used to doing

@TomAugspurger
Copy link

I might be wrong about this, but my understanding is that filling out that form dumps your info into a Google Sheet. Then https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/.github/workflows/UpdateMembers.yml runs daily and automatically approves people using something in https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/.github/scripts/update_membership.py (adding to GitHub org? I haven't looked closely).

I'm not sure if there's any additional configuration on the auth0 side.

@sgibson91
Copy link
Member Author

So the info we're collecting is a GitHub handle and so we'll want an auth config like the below, with a filter for the appropriate GitHub org (so it's not everyone on GitHub!). Does that sound right?

https://github.com/2i2c-org/pilot-hubs/blob/845c961b3ee95a9f919a7d4c2086fdc8901baa5a/config/hubs/2i2c.cluster.yaml#L102-L103

@sgibson91
Copy link
Member Author

sgibson91 commented Aug 5, 2021

I had a quick chat with @consideRatio in JupyterHub's gitter about GitHub teams being accepted by the GitHubOAuthenticator, like organisations are. It is apparently a feature that has been requested via other pathways, and rather than reimplement everything in https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/auth0, we should just make an upstream contribution to the OAuthenticator repo. (For comparison, GitLabOAuthenticator already accepts teams so this isn't wildly out-of-scope for the project.) I think this will also make the code we need to maintain on 2i2c's end less complex.

Update: someone has already beaten me to it! jupyterhub/oauthenticator#449

@sgibson91
Copy link
Member Author

sgibson91 commented Aug 5, 2021

Optionally restrict access to specific GitHub Orgs and Teams If these values are not set, any github user can log in

  • Under 'Application Settings', 'Show Advanced Settings', 'Application Metadata': REQUIRED_GITHUB_ORGS=pangeo-data REQUIRED_GITHUB_TEAMS=pangeo-data/us-central1-b-gcp

Is the GitHub team/org here an "and" conditional or an "or" conditional?

Update: I think this is an "and" by virtue of how GitHub structures orgs and teams

@rabernat
Copy link
Contributor

rabernat commented Nov 8, 2021

Please note that the old URL of https://pangeo.2i2c.cloud will no longer work

Ok, I did not realize this. I am getting lots of panicked messages from students.

Can we put a redirect at https://pangeo.2i2c.cloud?

@sgibson91
Copy link
Member Author

Screenshot 2021-11-08 at 17 51 28

I did the above in Namecheap following https://www.namecheap.com/support/knowledgebase/article.aspx/385/2237/how-to-redirect-a-url-for-a-domain/ Hopefully it works and the docs say it should take ~30 mins to start working

@rabernat
Copy link
Contributor

rabernat commented Nov 8, 2021

Just noting that redirection in not yet working.

@sgibson91
Copy link
Member Author

The redirect was also the issue of last night's certificates problem so I have removed it. TL;DR: I don't think it's possible to have the redirect you requested.

Let me explain what happened:

We have two DNS zones: 2i2c.cloud managed by 2i2c through Namecheap, and pangeo.io managed by the Pangeo community through Hurricane Electric (though I have access).

pangeo.2i2c.cloud is an A record that points to the IP address of the cluster's LoadBalancer

us-central1-b.gcp.pangeo.io is a CNAME for pangeo.2i2c.cloud

We set it up this way such that if the IP address of our LoadBalancer changes for any reason, we only need to update the A record in Namecheap and Pangeo will inherit the changes automatically. You can see how this would be critical for cases where we do not have access to the desired domain.

So when we setup the redirect from pangeo.2i2c.cloud to us-central1-b.gcp.pangeo.io, the certificates failed to resolve as it was looking for a valid response from us-central1-b.gcp.pangeo.io which is a CNAME for pangeo.2i2c.cloud which redirected back to ...pangeo.io... --> a vicious loop of non-resolving websites emerged.

I have now removed the redirect and our certificates have gone back to normal. I logged into the prod hub with no issue in a private browser.

@rabernat
Copy link
Contributor

rabernat commented Nov 9, 2021

Thanks for the explanation.

As of right now, I am still getting NET::ERR_CERT_COMMON_NAME_INVALID on pangeo.2i2c.cloud. Can I assume that this is a DNS TTL issue which will resolve itself in time?

@sgibson91
Copy link
Member Author

Perhaps? We should be using us-central1-b.gcp.pangeo.io anyway as the Load Balancer won't allow traffic from pangeo.2i2c.cloud now.

@choldgraf choldgraf moved this from Needs Refinement to Ready to work in DEPRECATED Engineering and Product Backlog Nov 10, 2021
@rabernat
Copy link
Contributor

The migration seems to have done great! I am so incredibly pleased with the entire process. Thanks to everyone and especially @sgibson91 for leading this effort! 👏 🏆 🏅

Can we switch off the old cluster now?

@choldgraf
Copy link
Member

choldgraf commented Nov 16, 2021

@rabernat we've got a blog post ready to go live that announces that the Pangeo cluster is now running on the 2i2c deployment: 2i2c-org/2i2c-org.github.io#85

In the post, it suggests that on November 22nd (next Monday), the old cluster will be shut down. We were hoping that would be enough time for one last migration push. Is that OK with you? I know that cost is a concern here, so if you think that the timeline needs to be accelerated let us know.

@rabernat
Copy link
Contributor

Sure, sounds fine!

Can we cross-post this blog post on the Pangeo blog? (It's Medium unfortunately 🤮 )

@sgibson91
Copy link
Member Author

Can we cross-post this blog post on the Pangeo blog?

I'm happy with that

@choldgraf
Copy link
Member

@rabernat that sounds good to me, how do we do that?

@rabernat
Copy link
Contributor

I think we just copy-paste the rich text into medium. It would be great for @sgibson91 to be the official author on medium--for that Sarah you will need to let me know your medium account name and I'll add you as an author. If you don't want to do that, I can just publish it under my name and acknowledge you as the author in the text. Either way we will link to the original 2i2c post.

I'm fine to have the 2i2c post be the "main" post; the Pangeo blog is just a nice continuous record of major milestones for the project, so it will be good to have it show up there.

@sgibson91
Copy link
Member Author

Sarah you will need to let me know your medium account name and I'll add you as an author

drsarahlgibson :)

@rabernat
Copy link
Contributor

Ok done! You should be able to submit a story to https://medium.com/pangeo/.

I would probably make the 2i2c post live first and then just copy-paste the text over to medium.

@sgibson91
Copy link
Member Author

Thank you @rabernat!

@choldgraf
Copy link
Member

choldgraf commented Nov 16, 2021

Update: Blog post is up!

We've posted a blog post about this one (2i2c-org/team-compass#272) and are now directing users to the Pangeo hub running on 2i2c's infrastructure!

What is remaining for us to consider this one complete? Is the task list above correct or are we missing something? I believe that as of right now, these tasks remain:

  • Confirm cloud access to other 2i2c engineers
  • Decommission the old Pangeo hub

@sgibson91
Copy link
Member Author

@rabernat I submitted the blog to Pangeo's medium 🚀

@sgibson91
Copy link
Member Author

I've just set off the command to delete the old pangeo-uscentral1b cluster

@rabernat
Copy link
Contributor

This news makes me so happy. Thank you @sgibson91!

@choldgraf
Copy link
Member

Just want to echo @rabernat - thanks @sgibson91 for all of your hard work in helping transition Pangeo to the next phase of its infrastructure 🚀

@sgibson91
Copy link
Member Author

Thanks folks!

  • Confirm cloud access to other 2i2c engineers

@choldgraf shall we track this in 2i2c-org/team-compass#136 and close this one?

@choldgraf choldgraf moved this from Ready to work to In progress in DEPRECATED Engineering and Product Backlog Nov 23, 2021
@choldgraf
Copy link
Member

@sgibson91 that works!

@choldgraf
Copy link
Member

choldgraf commented Dec 8, 2021

Note - I closed up https://github.com/orgs/2i2c-org/projects/24/views/1 so that we can keep that one focused around the major JupyterHub migration for Pangeo. For subsequent projects with Pangeo, we can create focused project boards around those topics as well. There were two more items in there, but it wasn't clear to me whether @sgibson91 considered those as part of the migration itself. If so, I think that we should re-focus efforts around those issues now, but if the Pangeo hub as-it-stands works well, then I think we can tackle those in subsequent efforts around that hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

6 participants