-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay closing connection on the server after all streams are completed #6957
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6957 +/- ##
==========================================
+ Coverage 83.61% 83.75% +0.14%
==========================================
Files 287 287
Lines 30929 30938 +9
==========================================
+ Hits 25860 25911 +51
+ Misses 3998 3960 -38
+ Partials 1071 1067 -4
|
|
||
if testing.Testing() { | ||
goAwayTimeout = 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests that use grpc.NewServer
and define very short timeouts should not be affected.
@@ -83,6 +86,10 @@ func init() { | |||
internal.BinaryLogger = binaryLogger | |||
internal.JoinServerOptions = newJoinServerOption | |||
internal.RecvBufferPool = recvBufferPool | |||
|
|||
if testing.Testing() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like testing.Testing()
was only made available since go.1.21.0 https://pkg.go.dev/testing#Testing
We maintain a backward compatibility to the last 3 versions of go. (i.e., v1.19+)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions what should I do about it? Options I can think of:
- add a hack to determine if this is a testing build, something like this https://stackoverflow.com/questions/14249217/how-do-i-know-im-running-within-go-test
- Don't enable the delay by default, add a server option to activate it
- Enable delay by default and ignore the fact that it could break some tests that expect the server to be shut-down faster then 1 sec. Add a server option so that users can disable (or maybe configure?) the delay in their tests.
Oh this is especially nasty, and is likely the root cause behind #5358, which has confounded me for some time. So the root cause of the problem is that if the server's receive buffer is not empty, then closing the connection results in RST instead of FIN. https://www.freesoft.org/CIE/RFC/1122/99.htm
I have a slightly different idea here, for a way to fix this without incurring a fixed delay on everything. Let me play around with it a bit. |
Thank you for the PR. Please see #6977 which has a preferred fix. |
We hit exactly the same bug that was previously reported in grpc-java grpc/grpc-java#9566
It was also fixed in http2 go standard lib golang/net@cd69bc3#diff-75c1d03aa4d9dfad1fe759ba9fce0462d6e88f2e33242425c4563d4ffb454951
TL;DR is that when server sends large responses and configures MaxConnectionAge, sometimes it closes the connection too early after finishing processing all active streams (the client is not done reading all data from the connection yet, the data is still stored inside TCP kernel buffer on the server) In such cases the client receives RST and gets the following error:
The fix is very simple: sleep for 1 sec after finishing processing all streams and before closing the connection.
I can reliably reproduce this error in our staging environment, and I verified that the proposed fix actually works.
I am still working on a unit test that reproduces the error locally. My plan is to modify this test and make the server send large responses, but I'll appreciate any guidance on how we should test this.
I can also make the sleep duration configurable if you prefer.