-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix close deadlock #21
Conversation
There was a race condition, where upon closing a conn and all its child streams, a (potential?) issue in the underlying transport library caused the 'Close' call to hang, leading to deadlocks in the peerstream code. This changes the lock to only be held during the closed'ness check, and should protect go-peerstream users against issues (misunderstandings?) in the underlying transport libs.
I tried writing a test for this, but the race was very difficult to reproduce (reproduced once in 700 5 second runs). The ipfs dht tests do a better job of reproducing the issue (1/20 or so). |
if c.closed { | ||
c.closeLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wont work because then a second caller could exit way before the Conn is actually closed. maybe make them wait?
c.closeLock.Lock()
if c.closed {
c.closeLock.Unlock()
<-c.closeChan
return c.closeErr
}
c.closed = true
c.closeLock.Unlock()
defer func() {
c.closeChan<- struct{}{}
}()
or use https://github.com/jbenet/goprocess and avoid the close lock madness alltogether, which also gives us nice errors. (goproc doesnt have the UX bugs, it's goproc/ratelimit that has caused the problems in the last few months)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making them wait is the whole problem i'm trying to avoid, lol. The code in your example is functionally equivalent to the code causing the issue we have now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that most callers of Close
don't really care to wait until it completes, they just perform some routine cleanup as part of some other method. Asking them all to wait until its 'done' isnt what we want (either that or we need a BeginClose
method, or some way of signifying we dont care to wait)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then those callers should say go s.Close()
-- the semantics of Close()
are that it may be stuck waiting for it to fully close, and if something shouldnt be waiting, then it should really not be waiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats also something we don't want though, we would end up with tons of random goroutines hanging around for an indeterminate period of time.
Are you opposed to a method that calls close, and exits early if its being closed already like this one? maybe StartClosing
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, call it c.GoClose() and have it spin a goroutine if it does need to
close (i.e. do honor the "go"). in practice it wont happen
On Wed, Dec 2, 2015 at 12:49 PM Jeromy Johnson notifications@github.com
wrote:
In conn.go
#21 (comment):if c.closed {
c.closeLock.Unlock()
thats also something we don't want though, we would end up with tons of
random goroutines hanging around for an indeterminate period of time.Are you opposed to a method that calls close, and exits early if its being
closed already like this one? maybe StartClosing or something?—
Reply to this email directly or view it on GitHub
https://github.com/jbenet/go-peerstream/pull/21/files#r46473117.
@whyrusleeping good find |
alright, that should address the problem, without any changes needed in go-ipfs. |
There was a race condition, where upon closing a conn and all its child
streams, a (potential?) issue in the underlying transport library caused
the 'Close' call to hang, leading to deadlocks in the peerstream code.
This changes the lock to only be held during the closed'ness check, and
should protect go-peerstream users against issues (misunderstandings?)
in the underlying transport libs.