-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix memory leak holding onto streams unnecessarily #93
Conversation
6e81395
to
5b5c3ab
Compare
please review carefully the |
dht_net.go
Outdated
dht.smlk.Unlock() | ||
|
||
ms = &messageSender{p: p, dht: dht} | ||
if err := ms.prep(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just make a single messageSender
and synchronize on its lock for prep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cant do that without holding the dht.smlk for the duration of the prep call. And the prep call may take arbitrarily long (up to a dial timeout). We don't want to block every peer from calling this message waiting on a dial. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can sync on the lock outside the smlk
.
I was thinking of starting with something like this:
ms = &messageSender{...}
dht.strmap[p] = ms
dht.smlk.Unlock()
ms.lk.Lock()
if err := ms.prep(); ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can even acquire the edit: that would block.ms.lk
inside smlk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, the subsequent logic would have to be adjusted, but i think it might result in simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what happens when two separate goroutines have the same message sender (one created it and one got it from the map and is waiting on the prep lock) and then prep fails? goroutine that called prep exits, and the goroutine that just got a reference to the messageSender and was waiting on the lock now tries to re-call prep, which is okay if prep fails again, but if prep succeeds on that second call for some reason, then we have an open stream attached to a message sender that isnt being tracked by anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is the problem. I had a similar issue in #92 and resolved it with a reset count, but it's hairy.
@@ -133,13 +164,10 @@ type messageSender struct { | |||
p peer.ID | |||
dht *IpfsDHT | |||
|
|||
invalid bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this? doesn't seem to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it is. We were toying with using it and decided we didn't need it but forgot to remove it. Now, we realized we really do need it.
If you can think of a better, race-free way of doing this dance, suggestions welcome!
5b5c3ab
to
4237ef0
Compare
dht_net.go
Outdated
singleMes int | ||
} | ||
|
||
func (dht *IpfsDHT) newMessageSender(p peer.ID) *messageSender { | ||
return &messageSender{p: p, dht: dht} | ||
// invalidate is called after this messageSender is removed form the strmap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, it is called before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This was definitely causing a decent memory leak. I'm sorry.