-
Notifications
You must be signed in to change notification settings - Fork 18k
net: goroutine leak when when calling File on a TCPConn #24455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
File
on a TCPConn
https://golang.org/pkg/net/#TCPConn.File says: Is the |
The File code internally calls into os's NewFile, which sets a finalizer on the fd itself to close it when it is garbage collected. However, the
to ensure that not closing the file is not the issue. For ease, the following play link (which has the code) also reproduces the issue: https://play.golang.org/p/tnNDto5l8vf |
You do realize you are calling Do(req) twice and you never close those responses. |
It was a typo from modifying the same source enough to get a "minimal" reproducer. Here is an updated file: https://play.golang.org/p/v2RRGMZJAY3. This fixed playground code closes the responses the same way the prior code did in a goroutine. I welcome other potential problems I have done, but this is a minimal reproducer that I have been trying to track down on and off for 2.5 years now. I have suspicions that the epoll implementation itself is a bit off due to other stuck goroutines I am seeing (a process that has been up for 140 days has many stuck in runtime_pollWait), but I will try to create a minimal reproducer for that separately. |
The problem is that the socket is added with EPOLLET (edge triggered) which is no longer true. |
If you use a After removing the duplicated Does that program produce a different result on your machine? |
If I remove the |
Yes, I do still see the leak. Part of the reason for the code being structured so tightly was to attempt to force the races. I ran your code; nothing leaked for one run at 1,000 connections, but leaks did occur at 10,000.
Here is my current diff on your code (with the GC):
What is most exciting for me actually is that re-running the code (with 10k connections) a few times in a row without the
It is much more subtle, but is much more real; let me know if I should make a separate issue for that or if I should add more details here. I recommend running the code with 10k connections and maybe on worse hardware. |
The only "leak" I see is with
The problem there is that it will never break out of the read since its blocking and all deadlines are useless. I do not see any other leaks. The persistConn.readLoop stack, I do see when the initial stacks are dumped, but if I wait and then dump again, they are gone. The dialConn ones are still present. |
I have just re-ran this by copying the code out of the play link in #24455 (comment) and applied my bump-to-10k and hang-with-select mods. I ran this and then waited 5 minutes before checking the goroutines. At 5 minutes:
I would not be surprised if your hardware is better than mine and that maybe things do not mess up on better hardware. What more can I do in this situation? |
I can tell you that I did an strace with and without the File() call to see what the underlying difference was and I only needed <5 connections to show that the normal path would get a read failure with EAGAIN which didn't and wouldn't occur once the connect was blocking. |
@fraenkel Do you have a fix in mind for the |
Potentially related to #24483, but I have not seen the same race trace while repeatedly running this. |
Also potentially related (outside of the |
Fixed by #24942 (but maybe the underlying issue still exists with nonblocking connections) |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?(Also tested on 1.7.6, 1.10)
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?What did you do?
File 1, the producer:
https://play.golang.org/p/eNMNVTGutd9
File 2 (separate to ensure that the goroutines do not show up):
https://play.golang.org/p/Mz1h9F3963o
What did you expect to see?
No goroutines hung.
What did you see instead?
These goroutines do not go away.
If
tc.File()
is commented out in the first play link, the goroutines do goaway. I suspect there is a race between connections being set to non-blocking
and removed from epoll management and with a goroutine falling into epoll to
wait for a read.
The examples use an http server / client, but I would suspect this issue is visible with raw connections.
The text was updated successfully, but these errors were encountered: