Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish the watch stream connection when exiting #278

Merged
merged 1 commit into from
Aug 23, 2018

Conversation

agrare
Copy link
Member

@agrare agrare commented Aug 23, 2018

Prevent thread joins always timing out due to the watch stream blocking
for far longer than the join timeout.

Ref: #271 (comment)

@agrare
Copy link
Member Author

agrare commented Aug 23, 2018

cc @Ladas @cben

@Ladas Ladas self-assigned this Aug 23, 2018
@@ -108,7 +108,13 @@ def ensure_watch_threads
def stop_watch_threads
safe_log("#{log_header} Stopping watch threads...")

finish.value = true
# First call WatchStream#finish to forcibly terminate the loop, this
# closes the HTTP connection and will cause the #each method to raise an
Copy link
Contributor

@cben cben Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be precise, the #each method crashes and rescues internally.
it's awkward implementation, but should be invisible.
if you see any exception leaking outside it, it's kubeclient bug (we already escalated the rescue couple times, ManageIQ/kubeclient#280 and ManageIQ/kubeclient#315)

oh, I see, I think manageiq is still using old kubeclient 2.5.2 where some are uncatched :-(
includes 280 but not 315.
bumping kubeclient to 3.x / 4.x is still blocked on several things that are basically ready, I just need to test image scanning.
I could also release 2.5.3 with 315 backported.

ok to catch HTTP::ConnectionError for now but let's have a more precise comment and do nag me to bump or backport.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay added that PR to the comment re: the exception

watch_streams[entity_type] = watch_stream

begin
loop do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the loop for restarting if it disconnects (ManageIQ/kubeclient#275)? have you actually observed that, and does this work?
I think you need a fresh start_watch() to reconnect (with fresh resource_version of course)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen it yet but yes that's exactly why I had this loop, if this won't work i'll drop the loop and just let the thread restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thread restarting looks good, less code paths is good

@agrare agrare force-pushed the terminate_watch_stream_connection branch 2 times, most recently from a55057d to a72b2d8 Compare August 23, 2018 15:04
Prevent thread joins always timing out due to the watch stream blocking
for far longer than the join timeout.
@agrare agrare force-pushed the terminate_watch_stream_connection branch from a72b2d8 to c216551 Compare August 23, 2018 15:05
Copy link
Contributor

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
merging

self.initial = true
self.queue = Queue.new
self.resource_versions = {}
self.watch_threads = {}
self.watch_streams = Concurrent::Map.new
self.watch_threads = Concurrent::Map.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder, not necessarily in this PR: you also planned resource_versions to be Concurrent::Map.

is this https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Hash.html ?

locks against the object itself for every method call, ensuring only one thread can be reading or writing at a time

will lock contention be a problem?
I'd guess this is minor compared to overhead of reading notices from network.
anyway I'm cool with erring on side of safety and profiling later.

@cben cben added this to the Sprint 93 Ending Aug 27, 2018 milestone Aug 23, 2018
@miq-bot
Copy link
Member

miq-bot commented Aug 23, 2018

Checked commit agrare@c216551 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 1 offense detected

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

@cben cben added the inventory label Aug 23, 2018
@cben cben merged commit ff072cd into ManageIQ:master Aug 23, 2018
@agrare agrare deleted the terminate_watch_stream_connection branch August 23, 2018 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants