Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid repeated push of rejected configuration #46

Closed
kyessenov opened this issue Feb 28, 2018 · 12 comments
Closed

Avoid repeated push of rejected configuration #46

kyessenov opened this issue Feb 28, 2018 · 12 comments

Comments

@kyessenov
Copy link
Contributor

If envoy rejects a config, the snapshot cache will attempt to push the same config immediately after envoy requests it again. If envoy does not limit its requests, the server will drive envoy into a loop of request-receive-reject, potentially causing unnecessary CPU load.

@kyessenov
Copy link
Contributor Author

Tracking issue in envoy: envoyproxy/envoy#2169

@gsagula
Copy link
Member

gsagula commented Mar 2, 2018

@kyessenov It saw that you are tracking this 2169 ^^. I'll be pushing the PR over the weekend. Quick question: Do you know by any chance what would be a good baseline to detect continuous bursts (requests/sec). We are not making it configurable for now. Thanks!

@kyessenov
Copy link
Contributor Author

I don't think I know the number that was validated in the real world. I think something <100qps is a realistic upper-bound for config updates per proxy. It might be worth trying to push invalid configs for each xDS at certain qps, and see CPU impact. Currently, envoy just consumes all available CPU while trying to invalidate the config.

@gsagula
Copy link
Member

gsagula commented Mar 3, 2018

I will try that and if I can get a more accurate number, but something like <100qps seems realistic for me too. Thanks!

@mihaitodor
Copy link

I think this has been fixed here: envoyproxy/envoy#4787 It exposes rate_limit_settings in the ads_config of the Envoy config.

@kyessenov
Copy link
Contributor Author

Correct. It would be nice to wire the feedback from the server. Some messages are useful (invalid config, missing reference, etc).

@mihaitodor
Copy link

mihaitodor commented Aug 5, 2020

Hmmm... I'm using the Callbacks to implement the OnStreamResponse method and I get this message when adding a duplicate listener: Error adding/updating listener(s) whoami-http-one:10003: cannot bind '192.168.168.168:10003': Address already in use. It also provides error code 13. The issue was that it was flooding my service by calling OnStreamResponse like crazy if this misconfiguration happened, until I configured the rate_limit_settings in Envoy to make it repeat the request less often.

Not sure if this is what you're referring to, though.

@kyessenov
Copy link
Contributor Author

Yeah, it is just there is no way to plug in server side rate limiter ATM. Some of these errors are ephemeral so not sure what action to take anyways until it settles (e.g. the older listener stops listening in your case)

@mihaitodor
Copy link

Ah, I see what you mean now. Thanks for the clarification! I'm not sure either how the server should behave, but the default rate_limit_settings in Envoy seem excessive to me. Retrying so fast can easily DDoS the go-control-plane server and I'm curious why anyone would need these retries to happen faster than once / second or so...

@github-actions
Copy link

github-actions bot commented Apr 6, 2021

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 6, 2021
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@akshaysngupta
Copy link

akshaysngupta commented Oct 11, 2023

Another option could be for control plane to allow implementors to choose if they want to retry on a specific error.

In following code, we can introduce a callback for OnStreamResponseNacked and OnStreamDeltaResponseNacked which can evaluate the error and decide to retry.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants