-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: always enable TCP keepalives with OS defaults #6834
Conversation
@atollena : If you are interested in this change, please let me know and I'll mark you as a reviewer. Thanks. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6834 +/- ##
==========================================
+ Coverage 83.53% 83.61% +0.08%
==========================================
Files 285 286 +1
Lines 30754 30770 +16
==========================================
+ Hits 25690 25729 +39
+ Misses 3998 3981 -17
+ Partials 1066 1060 -6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had in mind, thanks!
internal/tcp_keepalive.go
Outdated
// disabling the overriding of TCP keepalive parameters by setting the | ||
// KeepAlive field to a negative value above, results in OS defaults for | ||
// the TCP keealive interval and time parameters. | ||
ControlContext: func(_ context.Context, _, _ string, c syscall.RawConn) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to ignore: you can use Control
instead of ControlContext
for one less unused parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a minor improvement to me, but I don't feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with ControlContext
over Control
(even though the context parameter is unused) since the former is preferred if both are set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the ControlContext
field was only added in Go1.20, and currently the least supported Go version is still Go1.19. So I'm changing this back to Control
.
dialoptions.go
Outdated
// TCP keepalive time and interval to 15s. | ||
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a | ||
// negative value. | ||
// Note: As of Go 1.21, the standard library overrides the OS defaults for TCP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this wording is confusing, and have been trying to think of something better... Maybe:
Note: All supported releases of Go (as of December 2023) override the OS defaults for...
Or "as of the Go 1.21 release" or "up to and including at least Go 1.21" for the parenthetical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
@@ -813,15 +813,6 @@ func (l *listenSocket) Close() error { | |||
// Serve returns when lis.Accept fails with fatal errors. lis will be closed when | |||
// this method returns. | |||
// Serve will return a non-nil error unless Stop or GracefulStop is called. | |||
// | |||
// Note: As of Go 1.21, the standard library overrides the OS defaults for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I will be sending a follow-up PR to fix things on the server-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this with the latest commit, based on our discussion on the issue.
dialoptions.go
Outdated
// for keepalive time and interval, use a net.Dialer that sets the KeepAlive | ||
// field set to a negative value, and sets the SO_KEEPALIVE socket option to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sets (the KeepAlive field) set to a negative". Remove the 2nd "set".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
internal/tcp_keepalive.go
Outdated
// disabling the overriding of TCP keepalive parameters by setting the | ||
// KeepAlive field to a negative value above, results in OS defaults for | ||
// the TCP keealive interval and time parameters. | ||
ControlContext: func(_ context.Context, _, _ string, c syscall.RawConn) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a minor improvement to me, but I don't feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but one question about something else we might want to fix.
@@ -264,7 +263,7 @@ func newHTTP2Client(connectCtx, ctx context.Context, addr resolver.Address, opts | |||
} | |||
keepaliveEnabled := false | |||
if kp.Time != infinity { | |||
if err = syscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil { | |||
if err = isyscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function sets the TCP user timeout but does not enable keepalives.. should(n't) we update it to do that, too, in case they are disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TCP_USER_TIMEOUT specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection and return ETIMEDOUT to the application. If the option value is specified as 0, TCP will to use the system default.
I think it is totally a reasonable thing to use it without TCP keepalives.
As per A18, the reason we set TCP_USER_TIMEOUT when gRPC keepalives are configured is to ensure that if things are stuck at the TCP layer, setting the TCP_USER_TIMEOUT will ensure that the connection is closed anyways (even if gRPC keepalives are not able to do the same).
Are you concerned about the case where a user specifies a custom net.Dialer
to disable TCP keepalives, and configures gRPC keepalives, and therefore, we end up setting TCP_USER_TIMEOUT, but TCP keepalives are disabled? Even in this case, I feel it should be fine, since TCP_USER_TIMEOUT can work irrespective of whether TCP keepalives are configured on the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, sorry that makes sense, I'm not sure what I was thinking yesterday.. :)
In #6672, we attempted to configure OS defaults for TCP keepalive parameters (keepalive time and interval). But this had the unintended consequence of disabling TCP keepalives completely because of the way things are implemented in the Go stdlib. In this PR, we attempt to do the following:
Summary of changes:
net.Dialer.KeepAlive
field is not set to a negative value.net.Dialer.KeepAlive
field to a negative value, thereby disabling the Go stdlib's override of these values to15s
.There will be a follow-up PR for the server side.
Addresses #6250
RELEASE NOTES: