-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contour seems to stop sending updates when processing a large number of services #424
Comments
Thanks for reporting this issue, can you please test with 0.6.0.alpha.2 (i'm going to cut it soon). My hope is that the bug fixed in #423 may solve the lack of updates. @alexbrand I'm going to ask you to raise a separate issue for the memory usage. That is not expected, but I don't want to conflate these two issues.
This message is, well, to be honest, incorrect. The channel is full is just informational, its basically saying the k8s watcher ran ahead of processing by 128 items and now processing items is blocking more being read from the watcher. It's not saying anything other than contour is busy, which might be indicative of a problem, or not, I expect and have seen this message in my scale testing. However in alpha.2 processing of endpoints no longer goes through the buffer, see #404, so this message should be less prevalent. I'm not sure what is the right resolution to this is. I'm loathe to remove it, even though its noisy, it's a good way of finding out when the translator is busy. |
@davecheney would the max size of the channel & current size of the channel be useful metrics to expose? As an operator responsible for managing Contour, is this something I might need to have visibility into? |
No thank you. It's an implementation detail that might be removed in the
future. I don't want people to get addicted to that number as it is
difficult to infer if a high reading on that channel is a good or bad thing.
…On 6 June 2018 at 02:09, Ross Kukulinski ***@***.***> wrote:
@davecheney <https://github.com/davecheney> would the max size of the
channel & current size of the channel be useful metrics to expose? As an
operator responsible for managing Contour, is this something I might need
to have visibility into?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#424 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA0JD05sicmtwrjP1B6tcClH0QwUUks5t5q1WgaJpZM4UaAd0>
.
|
@alexbrand any update after trying 0.6-alpha.1 ? |
Closing this issue as it did not recur while testing with 0.6-alpha.1. |
Also refactor so both tests actually have to pass not just the second one and fix gateway nodeport example Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
While doing performance testing, I ran into an issue where a subset of the envoy pods were not getting configured, which in turn resulted in bad benchmark results.
Environment setup:
The test involved creating 5,000 services in a backend Kubernetes cluster, waiting until they are all discovered by Gimbal, and finally running wrk2 against the Gimbal cluster.
The following chart seems to indicate the only one of the envoys was getting CDS updates, whereas the rest were not:
I then noticed that one of the Contour instance's memory consumption was different than the other one. It seems like we might have a memory leak:
The only interesting bits from the longs were a bunch of these:
The rest of the logs were
stream_wait
andskipping update
messages.I haven't had a chance to try reproducing the issue, but will most likely try this week. Happy to provide any other information that might be useful.
The text was updated successfully, but these errors were encountered: