-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safeguard ctrl-mgr cancel func in ExtensionController #4937
Safeguard ctrl-mgr cancel func in ExtensionController #4937
Conversation
8a27de9
to
f9d138c
Compare
@@ -515,7 +522,13 @@ func (ec *ExtensionsController) startControllerManager(ctx context.Context) { | |||
// Stop | |||
func (ec *ExtensionsController) Stop() error { | |||
ec.L.Info("Stopping extensions controller") | |||
ec.mgrCancelFn() | |||
// We have no guarantees on concurrency here, so use mutex | |||
ec.mux.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this lock has any actual purpose. We do this because if we call ec.mgrCancelFn it can be nil, but if I understand correctly once it's set it can never become nil.
There are the following scenarios possible scenarios:
1- ec.mgrCancelFn
is nil, and remains nil when we call it, the lock doesn't serve any actual purpose
2- ec.mgrCancelFn
is not nil, remains unchanged when when we call the function.
3- ec.mgrCancelFn
is nil, and it gets some value later. Doesn't matter because the if checks false.
4- ec.mgrCancelFn
is not nil, changes to a different cancel function for whatever reason and we call the old cancel function.
Correct me if I'm wrong, but if we're doing mgrCancelFn := ec.mgrCancelFn
the lock serves no purpose at all. We should either do:
mgrCancelFn := ec.mgrCancelFn
if mgrCancelFn != nil {
mgrCancelFn()
}
or
ec.mux.Lock()
if ec.mgrCancelFn != nil {
ec.mgrCancelFn()
}
ec.mux.Unlock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not nil, changes to a different cancel function for whatever reason and we call the old cancel function.
That's the culprit IMO. In that case, the new context might not get cancelled. I agree we can get away without a mutex, if we prefer, but then we'd need atomics to swap values and cancel the swapped-out values (if not nil).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but if we're doing mgrCancelFn := ec.mgrCancelFn the lock serves no purpose at all.
It does: It prevents data races when reading ec.mgrCancelFn
, i.e. it ensures that we see the current cancel function, not some other value that was just overwritten concurrently. It's not necessary to hold the lock for longer than the variable read, as the actual cancel function implementations are concurrency-safe and we don't need to block until they returned. However, holding the lock until they returned doesn't harm either here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does: It prevents data races when reading
ec.mgrCancelFn
, i.e. it ensures that we see the current cancel function, not some other value that was just overwritten concurrently.
I'm not very convinced... If this goroutine that setsmgrCancelFn := ec.mgrCancelFn
acquires the lock BEFORE the goroutine that sets a new ec.mgrCancelFn
you have effectively the same problem and I don't see how can a lock help here...
The only thing that would prevent this would be:
ec.mux.Lock()
if ec.mgrCancelFn != nil {
ec.mgrCancelFn()
}
ec.mux.Unlock()
However at the end of the day it doesn't do much either because you don't have any guarantees that when a function checks if the context is done the new context may have been overwritten.
Anyway it doesn't do any harm so I guess we can ignore this for now and worry about these details in #4733 and subsequent work...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very convinced... If this goroutine that setsmgrCancelFn := ec.mgrCancelFn acquires the lock BEFORE the goroutine that sets a new ec.mgrCancelFn you have effectively the same problem and I don't see how can a lock help here...
That's right. The lock here only prevents the data race. The places where the cancel func gets written also need to ensure that the previous cancel func gets called.
I don't see how this could be fixed by holding the lock longer here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The places where the cancel func gets written also need to ensure that the previous cancel func gets called.
Added this
Currently we blindly call the cancel func and there are some cases when it actually can be nil. Signed-off-by: Jussi Nummelin <jnummelin@mirantis.com>
f9d138c
to
6e94f76
Compare
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-1.28
git worktree add -d .worktree/backport-4937-to-release-1.28 origin/release-1.28
cd .worktree/backport-4937-to-release-1.28
git switch --create backport-4937-to-release-1.28
git cherry-pick -x 6e94f763825c93ca7a5fa0899f79251398ee2d1b |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-1.29
git worktree add -d .worktree/backport-4937-to-release-1.29 origin/release-1.29
cd .worktree/backport-4937-to-release-1.29
git switch --create backport-4937-to-release-1.29
git cherry-pick -x 6e94f763825c93ca7a5fa0899f79251398ee2d1b |
Successfully created backport PR for |
Currently we blindly call the cancel func and there are some cases when it actually can be nil.
Description
Fixes #4930
Type of change
How Has This Been Tested?
Checklist: