-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown is not working as expected with default setup. #4002
Comments
@davem-git which L3/L4 load balancer are you using ? have you setup health checks ? |
I'm hosted in azure and gcp. Those test ended with the similar results on both clouds |
you'll need to setup health checks on the load balancer so they can stop routing to the envoy that is shutting down (draining), the first step when shutting dowm is failing health checks so the LB can route newer connections to a newer envoy pod
|
How are these set? |
any documentation? envoy gateway stands up these load balancers. I see I can set annotations, but I don't see any annotations for health checks for gcp. |
hey @davem-git we just merged #4021 that should help you with graceful shutdown on GKE with the default settings (no settings), you can try it out by using the |
hey @davem-git did you get a chance to try this out ? |
I've looked around and haven't found those settings for all of our cloud providers. I will test it out on azure today. So far though adding those settings I listed above seemed to have helped from my testing |
@davem-git with #4021 (which is now available with |
hey @arkodg , just a basic doubt is there no need for adding pod readiness gates on namespace where envoy proxies are running(proxies receiving traffic via nlb or alb) like for eg. on eks with aws-load-balancer-controller running, ref: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/deploy/pod_readiness_gate/ |
I tried testing this, The health checks failed to work. Envoy-proxy didn't function. I reverted |
@ncsham afaik the controller is reading the endpoint slices of the service from the API server and will detect any endpoints that are down (whose |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
Description:
I'm working on implementing envoy-gateway as a replacement for our nginx controller. I have some basic tests, a pod that returns a json block when hit an endpoint. Using K6 as a testing sweet. I set up the following test.
When I run this test and start a rollout restart of the envoy pods. I get the following errors
When I do this on nginx I do not get these errors.
I added these to my custom proxy config and it seemed to fix the issue
``sh
shutdown:
drainTimeout: 600s
minDrainDuration: 60s
The text was updated successfully, but these errors were encountered: