-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent xDS tight loop on cfg errors #12195
Conversation
Previously the Delta xDS protocol could get into a tight loop where: 1. Consul sends xDS resources it believes are valid 2. Envoy rejects them with a NACK 3. Consul re-generates and sends the resources 4. Loop back to step 2 This tight loop leads to massive amounts of error logs in both Envoy and Consul logs. This commit updates the delta xDS loop to wait for new requests or snapshots after a NACK is received from Envoy. By skipping to the top of the for loop we avoid re-sending the resources that Envoy has already rejected.
Thanks for the early feedback! Need to see what's going on with these tests as well. |
Nack is tested in TestServer_DeltaAggregatedResources_v3_NackLoop now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment about exported consts but LGTM!
Transparent proxies can set up filter chains that allow direct connections to upstream service instances. Services that can be dialed directly are stored in the PassthroughUpstreams map of the proxycfg snapshot. Previously these addresses were not being cleaned up based on new service health data. The list of addresses associated with an upstream service would only ever grow. As services scale up and down, eventually they will have instances assigned to an IP that was previously assigned to a different service. When IP addresses are duplicated across filter chain match rules the listener config will be rejected by Envoy. This commit updates the proxycfg snapshot management so that passthrough addresses can get cleaned up when no longer associated with a given upstream. There is still the possibility of a race condition here where due to timing an address is shared between multiple passthrough upstreams. That concern is mitigated by #12195, but will be further addressed in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I too share @kisunji 's comment about exported vs un-exported symbol names.
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/580062. |
Transparent proxies can set up filter chains that allow direct connections to upstream service instances. Services that can be dialed directly are stored in the PassthroughUpstreams map of the proxycfg snapshot. Previously these addresses were not being cleaned up based on new service health data. The list of addresses associated with an upstream service would only ever grow. As services scale up and down, eventually they will have instances assigned to an IP that was previously assigned to a different service. When IP addresses are duplicated across filter chain match rules the listener config will be rejected by Envoy. This commit updates the proxycfg snapshot management so that passthrough addresses can get cleaned up when no longer associated with a given upstream. There is still the possibility of a race condition here where due to timing an address is shared between multiple passthrough upstreams. That concern is mitigated by #12195, but will be further addressed in a follow-up.
Previously the Delta xDS protocol could get into a tight loop where:
This tight loop leads to massive amounts of error logs in both Envoy and
Consul logs.
This commit updates the delta xDS loop to wait for new requests or
snapshots after a NACK is received from Envoy. By skipping to the top of
the for loop we avoid re-sending the resources that Envoy has already
rejected.