-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overload manager doesn't exclude probes from stop_accepting_* actions #23843
Comments
cc @KBaichoo |
@caoyukun0430 I think this is an istio ingress gateway issue instead of envoy problem? From my view, when overload manager stop_accepting_* actions are triggered, the istio ingress gateway readinessProbe would also be disabled, but livenessProbe should still work to avoid the pod is deleted by k8s. |
@hobbytp it is a bit vague and could be also related to istio, issue raised. It would be nice to have opinions from both sides. |
This is reasonable, we can augment the action more broadly to either allow debug IP / headers to decide whether the traffic should be rejected if there's some mechanism the probe traffic has to identify it; This would make sense in fully controlled environments or if there's a scrubbing proxy beforehand. |
Hi @KBaichoo thanks for the reply! As what we understand from Istio istio/istio#41859 (comment), the probe health checking logic on port 15021 first reach envoy then request to pilot-agent on port 15020 to get stats from envoy to see if started or not. But in overload condition, the first step, which is request to 15021 is rejected by overload action. Furthermore, we would think it would be better that envoy can provide configurable port value to be excluded from overload stopping actions. The use case would be that when overload manager is configured on application pod sidecar, application could have its own health checking port/mechanism other than port 15021, then it would ask be rejected when action is triggered, but with the configurable excludePort for overload action, it could avoid the issue. |
So far looks like the use case has been about letting in liveness probes to Envoy during overload. |
@caoyukun0430 at the moment no one is working on this issue. I may pick it up in couple of weeks unless someone else does. You are also welcome to submit a patch :) |
Hi @nezdolik is there any update, did you find some time maybe to work on this issue? Thanks |
I'd like to pick this issue up. cc @kyessenov |
Go for it! I agree we want to keep an overloaded envoy to run as long as it can so it must pass the probes. If we crash/delete it too early, k8s cannot scale up fast enough. |
Title: Overload manager should exclude probes from stop_accept_ actions*
Description:
We are recently testing overload manager with istio. Our scenario is to overload the ingressgateway heap_size configured via sending requests from large number of concurrent connections, then we see stop_accepting_connections/requests actions triggered and new requests will fail.
But we found stop_accepting_connections(also stop_accepting_requests) stops the internal probes like liveness probe and readiness probe, which leads to the restart of the ingressgateway pod, which will causes issue for us.
We think it's reasonable to exclude the probes like liveness/readiness probe from overload manager stop_accepting_* actions, because new traffic will still fail and return users error 503 to indicate the overload, there seems to be no need to also stop internal probes.
Or we would like to know if it's intentionally to include probes into stop_accepting_* actions, if so, why?
related topic #21923
The text was updated successfully, but these errors were encountered: