-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: UnixListener blocks forever in Close() if File() is used to get the file descriptor #29277
Comments
This was observed in The problem is actually with the func (f *File) Fd() uintptr {
if f == nil {
return ^(uintptr(0))
}
// If we put the file descriptor into nonblocking mode,
// then set it to blocking mode before we return it,
// because historically we have always returned a descriptor
// opened in blocking mode. The File will continue to work,
// but any blocking operation will tie up a thread.
if f.nonblock {
f.pfd.SetBlocking()
}
return uintptr(f.pfd.Sysfd)
} The But, the gotcha is that Thus, the func (fd *FD) Close() error {
...
// Wait until the descriptor is closed. If this was the only
// reference, it is already closed. Only wait if the file has
// not been set to blocking mode, as otherwise any current I/O
// may be blocking, and that would block the Close.
// No need for an atomic read of isBlocking, increfAndClose means
// we have exclusive access to fd.
if fd.isBlocking == 0 {
runtime_Semacquire(&fd.csema)
}
return err
} See "File status flags" in fcntl(2) for reference. |
@dylanplecki Thanks, I think your analysis is exactly correct. The I don't know what we can do about this without losing features that are more important. I'm open to suggestions. Before that, though: what are you really trying to do? Is this something that could be done more appropriately with |
@dylanplecki That makes perfect sense, thanks for the analysis! We noticed that the @ianlancetaylor We're implementing a graceful restart feature (similar to HAProxy), so we need to pass file descriptors from the old process to the new using a unix domain socket. I don't think we can use I think the issue comes down to:
https://github.com/golang/go/blob/master/src/internal/poll/fd_unix.go#L102
As the comment says, this causes Even the original |
I think you can use |
I think I just ran into this issue in CL 154664. |
There is an issue for |
This post helps me a lot. And I also found that not only accept will block close, but alse read,write and other ops that will increase lock reference. Here is my test code:
you will find conn.Close() will be blocked until read get some data. |
Some of the things need to be done inside the respective network namespaces. Previous "GTPU issues" fix was a red herring, what we really have here is net conn's File() method breaking things after Fd() is retrieved from it. Ref: golang/go#29277 https://github.com/golang/go/blob/2291cae2af659876e93a3e1f95c708abb1475d02/src/os/file_unix.go#L76-L80
* e2e: quick and dirty prototype of 'captive VPP' * e2e: use http.Client for downloading * e2e: add framework package * e2e: copy netns stuff from CNI plugins * e2e: refactor netns usage a bit * e2e: add hs_proxy based test and remove main.go for now * e2e: make VPP config more flexible * e2e: add basic PFCP Association test * e2e: store PCAPs * e2e: establish PFCP session * e2e: update base image name * e2e: add names for namespaces * e2e: add routes * e2e: add ping to the image * e2e: download some data through UPG * e2e: disable offload for veths This gives a substantial speed boost so we also increase download size * e2e: disable http_static use for now as it breaks on Mac Docker * e2e: use gopacket instead of tcpdump to capture the packets * e2e: use patched go-pfcp from github * e2e: fix method name * e2e: improve docker env Specify base image as a build arg. Also, use dumb-init to avoid having zombies. * e2e: add README.md * e2e: reduce noise from NetNS.disableOffloading() * e2e: improve TrafficGen config and skip VPP WS test on Mac Docker * e2e: issue SessionModificationRequests with QueryURR and handle responses * e2e: refactor PFCPConnection * e2e: reproduce the crash on bad pdr_idx in the frame * e2e: fail when VPP dies * e2e: add a separate test case for PDR replacement crash * e2e: use saner PFCPConnection interface that suports multiple sessions * e2e: add PDR replacement test Crashes UPG if `idBase ^= 9` is uncommented * e2e: fix stopping upon VPP crash * e2e: fix traffic volume checks * e2e: make VPP binary and plugin paths configurable This helps with running the tests against VPP built using its 'make' * e2e: add UDP test cases * e2e: do actually change PDR Ids in TestPDRReplacement * e2e: remove obsolete comments * e2e: test app detection * e2e: test switching proxy on/off * e2e: refactor and add UPG_TEST_QUICK env var * e2e: add redirect test * e2e: switch to Ginkgo/Gomega * e2e: switch to govpp master to avoid panics upon VPP crash * e2e: update README.md after switching to Ginkgo/Gomega * e2e: add PGW mode * e2e: make PGW tests pass and enable PDR replacement for PGW * e2e: port newer uplane changes from sgwsimulator This fixes some of the unneeded error logging (that wasn't causing any test failures, though). * e2e: fix handling GTPU issues * e2e: fix silly sleep * e2e: fix kernel GTPU and re-do GTPU close fix Some of the things need to be done inside the respective network namespaces. Previous "GTPU issues" fix was a red herring, what we really have here is net conn's File() method breaking things after Fd() is retrieved from it. Ref: golang/go#29277 https://github.com/golang/go/blob/2291cae2af659876e93a3e1f95c708abb1475d02/src/os/file_unix.go#L76-L80 * e2e: add IPv6 tests * e2e: fix logging * e2e: fix SEID logging * e2e: add extra checks to ensure proper netns for Dial/Listen * e2e: support running tests in parallel * e2e: split framework package * e2e: add conn flood (+netem) tests * e2e: relax UDP content checks when retries are enabled * e2e: add icmp ping test * e2e: add session deletion loop and fix conn flood test * e2e: add MTU test * e2e: add artifacts dir * e2e: add junit reporting * Integrate new e2e tests with the rest of UPG * e2e: fix ICMP test name * e2e: make tests lighter (especially http) * e2e: use Ginkgo --flakeAttempts (mis)feature to cope with flakes Apparently, CI environment is not powerful enough to handle the tests atm. * e2e: fix http trafficgen flakes in persist mode * e2e: fix excessive resource consumption Had to tune down 'excessive' tests for now. Eventually, need to update trafficgen routines so that they don't allocate a new buffer for each goroutine. * e2e: fix compat with earlier UPF builds * e2e: try using 8 parallel nodes for e2e * e2e: add minimal docs * e2e: use Tomb for easier goroutine lifecycle control
Change https://golang.org/cl/286352 mentions this issue: |
thanks, this issue is good ! |
Because in older versions of Go all files were in blocking mode, so calling the |
Change https://go.dev/cl/449796 mentions this issue: |
<!-- Please delete this comment before posting. We appreciate your contribution to the Jaeger project! 👋🎉 Before creating a pull request, please make sure: - Your PR is solving one problem - You have read the guide for contributing - See https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING.md - You signed all your commits (otherwise we won't be able to merge the PR) - See https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md#certificate-of-origin---sign-your-work - You added unit tests for the new functionality - You mention in the PR description which issue it is addressing, e.g. "Resolves #123" --> ## Which problem is this PR solving? Resolves #4448 ## Short description of the changes Jaeger agent gets stuck when closing with SocketBufferSize set. This is because `Close()` of `net.UDPConn` will be blocked if `Fd()` is used to get the file descriptor. Use `RawConn.Control` instead to get fd to set the socket buffer. Same issue was discussed here: golang/go#29277 The fix refers to here: brucespang/go-tcpinfo#3 Signed-off-by: Chen Xu <chen.x@uber.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, it also reproduces with the latest tip.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Created a unix listener, called
File()
to get the underlying file descriptor, then ranAccept
in a goroutine. After some while, tried to callClose
on the listener, but theClose
blocks indefinitely.Repro:
https://github.com/prashantv/unix-close-race/blob/master/main.go
What did you expect to see?
Close
to return, and cause theAccept
(in a separate goroutine) to unblock, and return an error.What did you see instead?
In Go 1.11.3, the
Close
blocks indefinitely:In Go 1.10.6, the
Close
returns, but theAccept
goroutine is not unblocked,Other Notes
It looks like the behaviour change between 1.10 and 1.11 may be caused by https://go-review.googlesource.com/c/go/+/119955/ (fix for #24942)
I added a flag to the repro, if you pass
--use-new-fd
, instead of callingAccept
andClose
on the original unix listener (which we calledFile()
on), it uses a new listener from the copied file descriptor. This mitigates the issue (both in Go 1.10 and Go 1.11). It seems like callingFile()
duplicates the file descriptor, but somehow affects the original file descriptor causing issues withAccept
+Close
.cc @witriew who originally ran into this issue.
The text was updated successfully, but these errors were encountered: