-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to start up / Liveness probe failed #4898
Comments
Can confirm, seeing this exact issue. Nginx restarting like crazy. Example:
Running in EKS with Kubernetes version 1.14.8. Nginx version 0.27.1 (latest stable at the moment of writing this). Edit: I've been able to narrow it down and correlate nearly 100% to when a node spikes to 100% CPU. Restarts for an nginx pod running on a spiking node: According to #4505 this should have been fixed in #4487 (I confirmed and #4487 is included since 0.26.something), but I'm still seeing the exact same thing. |
We are seeing exactly this on |
the node where the ingress controller pod is running is using 100% of the CPU? The issue when the CPU utilization in the node is 100% the ingress controller start failing the probes because of the lack of time assigned to the pod. |
Hey, However our nodes were nowhere near 100%... cpu usage! |
I share the idea "our logic is we never want nginx to be throttled," but the issue here is that any spike in CPU can lead to probes failure/s.
Interesting. |
Closing. Fixed in #4959 Please reopen if you can reproduce the issue with this new image. |
Can confirm I have tested with this and no longer get cpu spikes when shutting down. |
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):
What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):
liveness, readiness, store, event, ingress, startup
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
NGINX Ingress controller version:
0.26.2
Kubernetes version (use
kubectl version
):v1.14.8
Environment:
uname -a
):What happened:

On random occasions, pods tend to be unable to start up the nginx controller.
After 30 seconds (as configured in the Liveness/Readiness probe
initialDelaySeconds
) the status changes toReadiness probe failed: Get http://10.244.1.29:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
(Domain name is blurred out.)
What you expected to happen:

To start up succesfully, like some pods do. The liveness/readiness state should be okay after the configured interval.
(Domain name is blurred out.)
How to reproduce it (as minimally and precisely as possible):
I'm unsure as to how to reproduce this issue.
One way is to kill the pod and wait for another one to pop back up and see if that one fails.
It might be worth noting that for pods that do start up succesfully, the
event.go:255] Event(v1.ObjectReference{Kind:"ConfigMap",
takes over 30 seconds, after which the controller starts up fine.The current set up does not give a good feeling on stability of the controllers, as a new pod might not start up succesfully.
Anything else we need to know:
The log lines about
Get http://127.0.0.1:10246/nginx_status: dial tcp 127.0.0.1:10246: connect: connection refused
can be 'ignored', as they are the result of the Prometheus ServiceMonitor coming along while the pod is not started properly.The text was updated successfully, but these errors were encountered: