-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500
occured while upgrade
#12401
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/remove-kind bug You have only one pod of the controller so yes, you will get brief disruption during upgrade. You can experiment with more than one replicas and the values for minAvailable etc. |
Thanks for your support! @longwuyuan As your suggestions here, I try to change my + replicaCount: 2 But the same error still occurred when the old pod switch to I also try to add |
Those are not the only values. Please explore others. If its about graceful draining of established connections, then please look at other such config options for timers etc. There is no well-documented use case with the controller for this. Each user finds their most suitable config by trial and error. |
I've tried a lot of ways:
All of them are not works. However, I have found that all the errors are coming from the old pod when executing the “wait-shutdown” script. The old pod still receives messages when the controller is shutting down and before nginx terminates, but this is not as expected: So I don't think it's a configuration issue, but rather a brief service interruption during graceful termination. In my opinion, the expected process maybe like:
But the current stage can't guarantee the second step happened before the third step. Could you double-check it? Thanks for your strong support again. |
func (srv *Server) ListenAndServe() error {
if srv.shuttingDown() {
return ErrServerClosed // the fatal error
}
addr := srv.Addr
if addr == "" {
addr = ":http"
}
ln, err := net.Listen("tcp", addr)
if err != nil {
return err
}
return srv.Serve(ln)
} |
More information updated: Once a pod transitions from |
What happened:
ingress-nginx-controller
zero downtime upgrade investigation.helm upgrade --reuse-values
command to complete upgrade.The system operates smoothly if no requests are sent during the upgrade period. However, when using
Grafana K6
to monitor the frequency of HTTPS requests, an error occurs as the new controller pod is fully initialized and the old pod begins to terminate. This issue only lasts for a brief moment, yet it can be consistently reproduced.Here is the warning event:
And here is the
K6
test log:During this period, I encounter numerous empty responses, and there are no error logs in the ingress-nginx-controller pod. However, if a TCP connection has been established prior to this, it remains uninterrupted (tested it by
telnet ${my-tcp-service} ${port}
command).So I want to confirm if it's the upgrade caused short-lived service interruption of the
ingress-nginx-controller
?What you expected to happen:
No warnings should occur throughout the upgrade process, and any requests should be handled whether or not the returned status code is
200
.NGINX Ingress controller version (exec into the pod and run
/nginx-ingress-controller --version
): v1.11.2 & v1.11.3Kubernetes version (use
kubectl version
):Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5
Environment:
Cloud provider or hardware configuration: I used Gardener to control all clusters, so I have no permissions to check it.
OS (e.g. from /etc/os-release): linux-amd64
Kernel (e.g.
uname -a
):Install tools:
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
Basic cluster related info:
kubectl get nodes -o wide
How was the ingress-nginx-controller installed:
helm ls -A | grep -i ingress
$ helm ls -A | grep -i ingress ingress-nginx ingress-nginx 28 2024-11-18 16:34:27.1373854 +0800 CST deployed ingress-nginx-4.11.3 1.11.3
helm -n <ingresscontrollernamespace> get values <helmreleasename>
Current State of the controller:
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Current state of ingress object, if applicable:
kubectl -n <appnamespace> get all,ing -o wide
kubectl -n <appnamespace> describe ing <ingressname>
kubectl describe ...
of any custom configmap(s) created and in useHow to reproduce this issue:
To reproduce it, you just need one web-service (any pod can receive HTTP request is ok). Then you can use this K6 script:
Anything else we need to know:
You can use my test image implemented by Go:
image: doublebiao/web-service-gin:v1.0-beta
The text was updated successfully, but these errors were encountered: