-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support exclusion of listeners from overload manager #5260
Comments
+1 to defaulting exclusion of the stats and readiness listener |
I want to give this a shot, quick question though:
Maybe I don't have the full picture, but was wondering how this can be achieved, given we are confirmed by the Gateway API spec. Would an annotation in the Gateway object do the job? For example, an annotation |
Hi @rudrakhp. ClientTrafficPolicy seems to support attachment to a specific GW-API listener within a GW-API Gateway. So, we can make that a feature of CTP. The challenge here might be handling contradicting definitions when the same listener is used in multiple GWs which are merged. It's a common problem though, for all settings that map to the Envoy Listener resource and not the Envoy Filter Chain (e.g. like downstream buffersize). Another option, similar to what you proposed, is to use an explicit list of listener references (Incl. GW name, namespace and section (==listener) name) that should be excluded. That's something that could reside in EnvoyProxy CR and would be scoped to all GWs in the relevant GWC. |
can we scope this down to disabling overload manager for the 2 static listeners (prometheus and health check) ? |
We can break it up, the first bit would be great for 1.4. The rest can come at a later phase. I think that we do support the healthcheck filter and customizing the k8s probes, so the second bit is still relevant for some users, but I imagine that patch policy can be used as mitigation until this is explicitly supported. |
Description:
Currently, EG generates overload manager settings that stop accepting requests when heap utilization is high:
gateway/internal/xds/bootstrap/bootstrap.yaml.tpl
Line 265 in c492089
This may have an unexpected side effect on the Envoy proxy pod's readiness probes, leading to probe failure, e.g. as seen here: envoyproxy/envoy#23843. Since liveness probes are not used by EG, the pod would not be restarted.
However, even a readiness probe failure may in some conditions lead to wide service disruption. For example, if using
ExternalTrafficPolicy: Local
and a single envoy pod in every node, an unready pod would trigger a failure of the NLB Health Check (https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/). Some NLBs, e.g. AWS NLB, are documented to reset all connections to that node in such cases:EG Should:
Envoy supports excluding listeners from overload manager: https://www.envoyproxy.io/docs/envoy/v1.31.5/api-v3/config/listener/v3/listener.proto.html#envoy-v3-api-field-config-listener-v3-listener-bypass-overload-manager
[optional Relevant Links:]
The text was updated successfully, but these errors were encountered: