-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
canary: Adds locking to prevent multiple concurrent invocations of confirmMissing
from clobbering each other
#5568
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, lgtm. Could we get this covered with a test, e.g. calling confirmMissing
concurrently when also calling pruneEntries
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks @afayngelerindbx.
Agree with @chaudum, can you please add a test? I will merge after that.
b419054
to
23b56ca
Compare
It's pretty tricky to actually ensure that these tests run with the right sequencing to cause all the panics. Let me know if the test needs to be something smarter. Also, do I need to add anything to the CHANGELOG? This seems like too small a change for that. |
@afayngelerindbx. It can be simple test that locks this behavior :) And for CHANGELOG. I don't think we need entry there as this is not user-visible change :) |
23b56ca
to
5a98535
Compare
I added a fairly simple test that accomplishes two things:
Please, let me know whether you think this is sufficient test coverage for this change. |
} | ||
|
||
for i := 0; i < 10; i++ { | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to add sync.WaitGroup
to makesure all goroutines are completed by end of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion. adding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding :)
- can we make
wg.Done()
asdefer wg.Done()
at the beginning beforeassert.NotPanic
is getting called? Because that way we can make surewg.Done()
is run even if that code panics. - Not familiar with
assert.Eventually
. But shouldn't just simplewg.Wait()
after the for loop is sufficient here? Not sure if I miss anything there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe assert.NotPanic
recovers from the panic and returns false. Adding the defer to make the code easier to reason about.
For 2, I would recommend keeping Eventually
there. The expectation is that this test should return fairly quickly(within 1s) keeping a wg.Done()
at the end will make the test suite time out(default 10m) if one of those goroutines hits a deadlock.
thanks @afayngelerindbx looks good. Just added one minor suggestion to add waitGroup to the test. Also looks like |
I think I signed. The CLA bot comment says so as well: #5568 (comment) I'm not sure why the check isn't passing. Any suggestions? |
970a2fc
to
eae3ca4
Compare
`confirmMissing` from clobbering each other
eae3ca4
to
9c183f5
Compare
I think it makes sense to add a changelog entry, because it fixes a race condition that produces a nil pointer reference. Other people may be effected as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing regarding the usage of the pointer inside a range loop.
Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
4bb051e
to
9ea9d58
Compare
What this PR does / why we need it:
nil pointer dereferences in this code are causing panics in our deployment. It seems that confirmMissing wasn't intended to be run concurrently. The locking in this PR prevents accessing
missingEntries
while it is also being set to slice of nils(comparator.go:L482).Which issue(s) this PR fixes:
Fixes #5128