-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Solved] Pangeo Hub has several critical issues #815
Comments
That is not the correct grafana URL - this is the one listed in the config and is reachable https://pangeo-grafana.pangeo.2i2c.cloud |
I suspect the certificate issues are due to the redirect Ryan asked me to setup last night. SetupWe have two DNS zones: 2i2c.cloud managed by us through Namecheap, pangeo.io managed by the Pangeo community through Hurricane Electric (though I have access). In 2i2c.cloud, we have a pangeo.2i2c.cloud A record that points to our LoadBalancer IP address. In pangeo.io, we have us-central1-b.gcp.pangeo.io that is a CNAME for pangeo.2i2c.cloud. It is setup this way such that if our LoadBalancer IP changes, we only need to edit the A record in 2i2c.cloud and pangeo.io will inherit the change through the CNAME. The RedirectWe only assign one domain name to our hubs to avoid confusion, this means that once the CNAME for us-central1-b.gcp.pangeo.io was setup, pangeo.2i2c.cloud begins returning a 404 since ingress-nginx now only accepts traffic from the pangeo.io domain. Hence Ryan asked me to setup a redirect from pangeo.2i2c.cloud to us-central1-b.gcp.pangeo.io, which I did here #482 (comment) What I suspect is happeningI don't think the certificates are able to resolve properly because they're trying to get a response from ...pangeo.io which is a CNAME for pangeo.2i2c.cloud which is then redirecting back to ...pangeo.io --> vicious loop of nothing giving a correct response. What I'm going to try
|
I did the above and logged into the production hub in a private browser. All certificates were present and the connection was private. So the certificates issue is now resolved. |
I could not replicate this so I suspect it was all a certificates/traffic problem, but I'm happy to be proven wrong if someone can provide concrete steps to demonstrate the problem? |
I think there has been some confusion regarding the certificates on the pangeo.2i2c.cloud URL. We stopped supporting multiple domains for a single hub to reduce complexity. See these PRs: #460 and #496 Hence when #812 was merged, we stopped issuing certificates for pangeo.2i2c.cloud and the load balancer stopped accepting traffic from there. Instead, we issue certificates for us-central1-b.gcp.pangeo.io and accept traffic from there. As mentioned above, the pangeo.2i2c.cloud address is only used so we can update the IP address of the load balancer if required in the cases where we don't have access to the desired domain. There are no certificate issues if folks use the us-central1-b.gcp.pangeo.io address, which I mentioned here #482 (comment) But instead we got waylaid by redirects. I think the only reason we've had this confusion is because the hub had users throughout the setup process. Normally, we would not have users until after this point. |
Just a note that the following works as-expected for me:
Quick thoughts:
|
There's a bit of a name-clash for grafana atm since I wasn't very clever when setting up the COESSING hub. infrastructure/config/hubs/pangeo-hubs.cluster.yaml Lines 13 to 16 in 6cf0a3f
So my plan was:
I had no intentions to point grafana at If we move forward with #427 at some point, I had also considered making these URLs |
Just checked out Namecheap for this. We have an A record In which case, I wonder if I took the wrong approach by trying to setup the redirect from Namecheap instead of in Hurricane Electric? Update: Had a quick look through Hurricane Electric and it wasn't obvious to me how to do this. |
I'm simultaneously proud of and ashamed of how the utoronto redirect works - it works via JS in the homepage! https://github.com/utoronto-2i2c/homepage/blob/master/extra-assets/js/login.js. It only works if you land on the homepage - if you're already logged in it has no effect. Do not recommend. |
@yuvipanda LOL that is amazing |
I've tidied up the top comment. I don't think there's anything actionable left here so I'm going to close this. |
It is fun to realize I actually thought about something along these lines when I was thinking about possible workarounds 😜 |
Many thanks @sgibson91 for being awesome! |
Summary
There are a variety of critical issues that have been reported on the Pangeo JupyterHub.
Certificate errors
Some users reported a certificate error when connecting to the hub. Here's an example of the error message:
FreshDesk tickets:
JupyterLab and kernal usability errors
Many users on the Pangeo hub are reporting slugging behavior. In particular, the following two actions take upwards of 30 seconds to complete, or do not complete at all:
FreshDesk tickets:
Grafana is not reachable
I tried to go to grafana.us-central1-b.gcp.pangeo.io but received a "your connection is not private" error. So I haven't been able to look at any dashboards to understand what might be going on.
After-action report
What went wrong
We setup a redirect between two URLs where one was a CNAME for the other. This turned out to be a Very Bad Idea ™️ . In PR #812, we replaced the
pangeo.2i2c.cloud
address withus-central1-b.gcp.pangeo.io
meaning that cert-manager was no longer issuing certificates forpangeo.2i2c.cloud
and our load balancer would no longer accept traffic frompangeo.2i2c.cloud
. All issues were resolved by undoing the redirect and visiting theus-central1-b.gcp.pangeo.io
address instead.Pangeo has been special-cased in that it has had active users before setup development was complete and I think the switch in URL is what confused people. Normally we would only invite users after the DNS has been set and so I don't see the issue arising again.
Action items
Documentation improvements
Actions
The text was updated successfully, but these errors were encountered: