-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish the watch stream connection when exiting #278
Finish the watch stream connection when exiting #278
Conversation
@@ -108,7 +108,13 @@ def ensure_watch_threads | |||
def stop_watch_threads | |||
safe_log("#{log_header} Stopping watch threads...") | |||
|
|||
finish.value = true | |||
# First call WatchStream#finish to forcibly terminate the loop, this | |||
# closes the HTTP connection and will cause the #each method to raise an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be precise, the #each method crashes and rescues internally.
it's awkward implementation, but should be invisible.
if you see any exception leaking outside it, it's kubeclient bug (we already escalated the rescue couple times, ManageIQ/kubeclient#280 and ManageIQ/kubeclient#315)
oh, I see, I think manageiq is still using old kubeclient 2.5.2 where some are uncatched :-(
includes 280 but not 315.
bumping kubeclient to 3.x / 4.x is still blocked on several things that are basically ready, I just need to test image scanning.
I could also release 2.5.3 with 315 backported.
ok to catch HTTP::ConnectionError for now but let's have a more precise comment and do nag me to bump or backport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay added that PR to the comment re: the exception
watch_streams[entity_type] = watch_stream | ||
|
||
begin | ||
loop do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the loop for restarting if it disconnects (ManageIQ/kubeclient#275)? have you actually observed that, and does this work?
I think you need a fresh start_watch()
to reconnect (with fresh resource_version of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen it yet but yes that's exactly why I had this loop, if this won't work i'll drop the loop and just let the thread restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thread restarting looks good, less code paths is good
a55057d
to
a72b2d8
Compare
Prevent thread joins always timing out due to the watch stream blocking for far longer than the join timeout.
a72b2d8
to
c216551
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
merging
self.initial = true | ||
self.queue = Queue.new | ||
self.resource_versions = {} | ||
self.watch_threads = {} | ||
self.watch_streams = Concurrent::Map.new | ||
self.watch_threads = Concurrent::Map.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder, not necessarily in this PR: you also planned resource_versions to be Concurrent::Map.
is this https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Hash.html ?
locks against the object itself for every method call, ensuring only one thread can be reading or writing at a time
will lock contention be a problem?
I'd guess this is minor compared to overhead of reading notices from network.
anyway I'm cool with erring on side of safety and profiling later.
Checked commit agrare@c216551 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb
|
Prevent thread joins always timing out due to the watch stream blocking
for far longer than the join timeout.
Ref: #271 (comment)