Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turing deployment is broken #1870

Closed
manics opened this issue Mar 23, 2021 · 10 comments
Closed

Turing deployment is broken #1870

manics opened this issue Mar 23, 2021 · 10 comments

Comments

@manics
Copy link
Member

manics commented Mar 23, 2021

The Turing deployment currently fails, for example https://github.com/jupyterhub/mybinder.org-deploy/runs/2174637361?check_suite_focus=true

Starting helm upgrade for turing
Error: UPGRADE FAILED: cannot patch "jupyterhub" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://hub23-ingress-nginx-controller-admission.hub23.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded && cannot patch "binderhub" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://hub23-ingress-nginx-controller-admission.hub23.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded && cannot patch "turing-grafana" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://hub23-ingress-nginx-controller-admission.hub23.svc:443/networking/v1beta1/ingresses?timeout=10s: dial tcp 10.0.194.230:443: i/o timeout && cannot patch "turing-prometheus-server" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://hub23-ingress-nginx-controller-admission.hub23.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded && cannot patch "redirector" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://hub23-ingress-nginx-controller-admission.hub23.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded && cannot patch "static" with kind Ingress: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://hub23-ingress-nginx-controller-admission.hub23.svc:443/networking/v1beta1/ingresses?timeout=10s: context deadline exceeded
Traceback (most recent call last):
  File "./deploy.py", line 346, in <module>
    main()
  File "./deploy.py", line 342, in main
    deploy(args.release, args.name)
  File "./deploy.py", line 189, in deploy
    subprocess.check_call(helm)
  File "/opt/hostedtoolcache/Python/3.8.8/x64/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'upgrade', '--install', '--namespace', 'turing', 'turing', 'mybinder', '--cleanup-on-fail', '--create-namespace', '-f', 'config/common/datacenter-aws.yaml', '-f', 'config/common/datacenter-gcp.yaml', '-f', 'secrets/config/common.yaml', '-f', 'config/turing.yaml', '-f', 'secrets/config/turing.yaml']' returned non-zero exit status 1.
Error: Process completed with exit code 1.

Google brought up kubernetes/ingress-nginx#5401 which 110 comments (I didn't read them all). Some of the suggested fixes include:

@sgibson91 Have you previously tried any of these?

@sgibson91
Copy link
Member

No I haven't, but it looks like a labelling issue maybe? In helm 3, labels are immutable which is why we see the "cannot patch" error. Might be easiest (but correct?) to delete the webhook configuration and let it recreate itself with the correct label.

@manics
Copy link
Member Author

manics commented Mar 23, 2021

Give it a go?

@sgibson91
Copy link
Member

Seeing the same issue with Hub23 now - I'll have to find some time to dig into this.

@sgibson91
Copy link
Member

I wonder if this is because both the mybinder.org and Hub23 charts are dependent on the ingress-nginx chart, Hub23's dependencies get automatically upgraded in a henchbot-like fashion, and we have some namespacing issue that is causing the bug.

@minrk any thoughts?

@sgibson91
Copy link
Member

I wonder if this is because both the mybinder.org and Hub23 charts are dependent on the ingress-nginx chart, Hub23's dependencies get automatically upgraded in a henchbot-like fashion, and we have some namespacing issue that is causing the bug.

I think this may be the case given the first warning box here https://kubernetes.github.io/ingress-nginx/deploy/

@minrk
Copy link
Member

minrk commented Apr 15, 2021

Yeah, that sounds very reasonable. From the chart values.yaml, it looks like adding:

ingress-nginx:
  controller:
    scope:
      enabled: true

to both chart configs on turing should do that.

@sgibson91
Copy link
Member

I'm gonna close this for now, since hopefully scoping nginx to the namespaces avoids future cross-contamination

@consideRatio
Copy link
Member

@sgibson91 @callummole I'm not sure what happened in https://github.com/jupyterhub/mybinder.org-deploy/runs/3111173370?check_suite_focus=true, but I think it is related to this issue.

/cc: @yuvipanda who pressed merge on #1991 that was the deployment that failed to deploy to the turing deployment specifically.

@yuvipanda
Copy link
Contributor

@consideRatio that failure seems to have been a transient issue. I restarted the action and it succeeded (https://github.com/jupyterhub/mybinder.org-deploy/runs/3111222534?check_suite_focus=true) and a local deploy also succeeded. The error message suggested to me that a validating webhook was stuck somewhere, but fixed itself.

@sgibson91
Copy link
Member

I actually pinned nginx to the same version mybinder.org is running and stopped the automatic updates for that sub-chart. So future issues shouldn't be related to this particular bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants