-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channelz: cleanup channel registration if Dial fails #2733
Conversation
clientconn.go
Outdated
@@ -153,6 +153,18 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn * | |||
}) | |||
} | |||
cc.csMgr.channelzID = cc.channelzID | |||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc.Close() will also remove channel and add trace channel deleted.
Move RegisterChannel
down (right before defer with cc.Close()
), then this should be covered by cc.Close().
@@ -196,18 +208,6 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn * | |||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work if there's a timeout set (cc.dopts.timeout > 0
) ? In that case, the defer
red actions will happen in the reverse order with this PR, so the auto-cancel()
of the replaced ctx
(with a timeout) happens before the check whether ctx
is Done()
, which means a cancellation error will always be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
I filed #2736 to track. Will have a solution shortly!
Thanks for noting and mentioning this!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a fix & testcase on its way.
Seems like grpc/grpc-go#2733 has resulted in behaviour where any attempt to gRPC dial with a timeout gives a cancelled-context error.
Seems like grpc/grpc-go#2733 has resulted in behaviour where any attempt to gRPC dial with a timeout gives a cancelled-context error.
Seems like grpc/grpc-go#2733 has resulted in behaviour where any attempt to gRPC dial with a timeout gives a cancelled-context error.
Seems like grpc/grpc-go#2733 has resulted in behaviour where any attempt to gRPC dial with a timeout gives a cancelled-context error.
Commit 955eb8a ("channelz: cleanup channel registration if Dial fails (grpc#2733)") moved a defer block earlier in DialContext() to ensure that cc.Close() was always called. This defer block also checks whether the ctx.Done() is true, and if so ensures the context error is returned. If the dial options include a timeout, the original context gets replaced with a new context that has the timeout, and this gets a catchall `defer cancel()` to go with it. However, this cancel() now gets called before the cleanup defer block, so when the latter runs the context is always already cancelled. Fix by splitting the larger defer block into two parts: - The part that does cc.Close() stays near the beginning of the method. - The part that checks ctx.Done() returns to below the `defer cancel()` call, and so gets invoked before it.
Commit 955eb8a ("channelz: cleanup channel registration if Dial fails (grpc#2733)") moved a defer block earlier in DialContext() to ensure that cc.Close() was always called. This defer block also checks whether the ctx.Done() is true, and if so ensures the context error is returned. If the dial options include a timeout, the original context gets replaced with a new context that has the timeout, and this gets a catchall `defer cancel()` to go with it. However, this cancel() now gets called before the cleanup defer block, so when the latter runs the context is always already cancelled. Fix by splitting the larger defer block into two parts: - The part that does cc.Close() stays near the beginning of the method. - The part that checks ctx.Done() returns to below the `defer cancel()` call, and so gets invoked before it.
Commit 955eb8a ("channelz: cleanup channel registration if Dial fails (grpc#2733)") moved a defer block earlier in DialContext() to ensure that cc.Close() was always called. This defer block also checks whether the ctx.Done() is true, and if so ensures the context error is returned. If the dial options include a timeout, the original context gets replaced with a new context that has the timeout, and this gets a catchall `defer cancel()` to go with it. However, this cancel() now gets called before the cleanup defer block, so when the latter runs the context is always already cancelled. Fix by splitting the larger defer block into two parts: - The part that does cc.Close() stays near the beginning of the method. - The part that checks ctx.Done() returns to below the `defer cancel()` call, and so gets invoked before it.
No description provided.