Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent xDS tight loop on cfg errors #12195

Merged
merged 7 commits into from
Feb 10, 2022
Merged

Prevent xDS tight loop on cfg errors #12195

merged 7 commits into from
Feb 10, 2022

Conversation

freddygv
Copy link
Contributor

Previously the Delta xDS protocol could get into a tight loop where:

  1. Consul sends xDS resources it believes are valid
  2. Envoy rejects them with a NACK
  3. Consul re-generates and sends the resources
  4. Loop back to step 2

This tight loop leads to massive amounts of error logs in both Envoy and
Consul logs.

This commit updates the delta xDS loop to wait for new requests or
snapshots after a NACK is received from Envoy. By skipping to the top of
the for loop we avoid re-sending the resources that Envoy has already
rejected.

@freddygv freddygv requested review from rboyer and a team January 26, 2022 17:02
Previously the Delta xDS protocol could get into a tight loop where:
1. Consul sends xDS resources it believes are valid
2. Envoy rejects them with a NACK
3. Consul re-generates and sends the resources
4. Loop back to step 2

This tight loop leads to massive amounts of error logs in both Envoy and
Consul logs.

This commit updates the delta xDS loop to wait for new requests or
snapshots after a NACK is received from Envoy. By skipping to the top of
the for loop we avoid re-sending the resources that Envoy has already
rejected.
@vercel vercel bot temporarily deployed to Preview – consul January 26, 2022 17:24 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 26, 2022 17:24 Inactive
agent/xds/delta.go Outdated Show resolved Hide resolved
agent/xds/delta.go Outdated Show resolved Hide resolved
@kisunji kisunji requested a review from a team January 26, 2022 18:08
@freddygv
Copy link
Contributor Author

Thanks for the early feedback! Need to see what's going on with these tests as well.

Nack is tested in TestServer_DeltaAggregatedResources_v3_NackLoop now.
@vercel vercel bot temporarily deployed to Preview – consul January 28, 2022 01:24 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 28, 2022 01:24 Inactive
agent/xds/delta.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kisunji kisunji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment about exported consts but LGTM!

freddygv added a commit that referenced this pull request Jan 28, 2022
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.

Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.

As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.

This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.

There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
agent/xds/delta.go Outdated Show resolved Hide resolved
Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I too share @kisunji 's comment about exported vs un-exported symbol names.

@vercel vercel bot temporarily deployed to Preview – consul February 10, 2022 22:25 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 10, 2022 22:25 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 10, 2022 22:27 Inactive
@vercel vercel bot temporarily deployed to Preview – consul February 10, 2022 22:27 Inactive
@vercel vercel bot temporarily deployed to Preview – consul February 10, 2022 22:36 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 10, 2022 22:36 Inactive
@freddygv freddygv merged commit 378a725 into main Feb 10, 2022
@freddygv freddygv deleted the nack-attack branch February 10, 2022 22:37
@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/580062.

@hc-github-team-consul-core
Copy link
Contributor

🍒❌ Cherry pick of commit 378a725 onto release/1.11.x failed! Build Log

@hc-github-team-consul-core
Copy link
Contributor

🍒❌ Cherry pick of commit 378a725 onto release/1.10.x failed! Build Log

freddygv added a commit that referenced this pull request Feb 10, 2022
freddygv added a commit that referenced this pull request Feb 10, 2022
freddygv added a commit that referenced this pull request Feb 11, 2022
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.

Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.

As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.

This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.

There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants