-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netlink: Close does not unblock concurrent Receive operation #162
Comments
I'm not sure I understand why you're restarting the Receive cycle. Why not keep the goroutine running to receive messages until application exit? You can set a deadline in the past to immediately time out a read, as I'm doing in: https://github.com/mdlayher/netstate/blob/master/watcher_linux.go (rtnetlink used here, but same applies for netlink)
This is probably a reasonable solution. I'd have to review the code again to see why exactly I set it up the way I did though. |
I don't want to restart the Receive cycle, just mentioned that I thought it would be an opportunity to periodically unblock the read if I use some sort of timeout. |
Yep that's correct. If you set a deadline that has already passed (I like a very small UNIX timestamp) it'll immediately cause an I/O timeout error on any blocking syscalls like that. Since this has tripped a few people up, I think it's probably still worth looking into. But that solution should help with your immediate problem. |
Looking at this again, I would expect that moving the mutex locking below closing the file descriptor would solve this problem, so something like: func (s *sysSocket) Close() error {
// Close the socket and indicate to other goroutines that the file
// descriptor has been closed, so further calls return EBADF.
err := s.fd.Close()
s.mu.Lock()
s.closed = true
s.mu.Unlock()
// Stop the associated goroutine and wait for it to return.
s.g.stop()
return err
} However it's been a long time since I've looked at this code and I'd have to remember why exactly we did it that way in the first place. The doc comment does indicate that a concurrent Close unblocked Receive at one point, so this is a regression. I am slightly afraid that making a small change like this could result in a send on closed channel elsewhere, so this will take some more consideration. |
/cc @acln0 in case you are still working with this package more frequently than I am. :) |
Signed-off-by: Matt Layher <mdlayher@gmail.com>
@mdlayher I am observing the same issue. Close does not unblock Receive.
Do you know when this might have regressed? |
See c5f8ab7. It won't be fixed until v1.2.0. |
I believe I have finally fixed this problem in #169, and the tests I've written seem to agree with that assertion. I will tag a v1.2.0 (which is Go 1.12+ only) and have folks give it a try in their own applications. I will update mine as well. |
Signed-off-by: Matt Layher <mdlayher@gmail.com>
Signed-off-by: Matt Layher <mdlayher@gmail.com>
Similar to #136, though the use-case is different.
I would want to listen for any IP Route changes so I use blocking Receive() for multicast messages. I cannot use timeout on it because than there is a chance to miss events while restarting the Receive cycle.
In order to break the Receive (i.e. on graceful exit), I close the Conn on another goroutine, which appears to be the only way to do it.
The problem is that Receive in it's core read function acquires the R side of the RWMutex:
netlink/conn_linux.go
Line 369 in ad88bf8
Close, however wants to acquire the W side here:
netlink/conn_linux.go
Line 502 in ad88bf8
which is not possible until the R(eader) side is freed.
This results in a deadlock, until something is received on the socket, which is not deterministic.
I don't have a solution in mind, but maybe close function should lock the W side of the mutex only after calling the close syscall, ensuring all blocking operations are woke up (and so releasing the lock). If other parallel operations are handling the errors appropriately, it might not cause problems if
s.closed
temporarily appears to be false, while the socket is already closed.The text was updated successfully, but these errors were encountered: