Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channelz: cleanup channel registration if Dial fails #2733

Merged
merged 2 commits into from
Apr 2, 2019

Conversation

lyuxuan
Copy link
Contributor

@lyuxuan lyuxuan commented Apr 1, 2019

No description provided.

clientconn.go Outdated
@@ -153,6 +153,18 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *
})
}
cc.csMgr.channelzID = cc.channelzID
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc.Close() will also remove channel and add trace channel deleted.

Move RegisterChannel down (right before defer with cc.Close()), then this should be covered by cc.Close().

@lyuxuan lyuxuan merged commit 955eb8a into grpc:master Apr 2, 2019
@@ -196,18 +208,6 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *
defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if there's a timeout set (cc.dopts.timeout > 0) ? In that case, the deferred actions will happen in the reverse order with this PR, so the auto-cancel() of the replaced ctx (with a timeout) happens before the check whether ctx is Done(), which means a cancellation error will always be returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!
I filed #2736 to track. Will have a solution shortly!
Thanks for noting and mentioning this!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a fix & testcase on its way.

daviddrysdale added a commit to daviddrysdale/trillian that referenced this pull request Apr 3, 2019
Seems like grpc/grpc-go#2733 has resulted
in behaviour where any attempt to gRPC dial with a timeout gives
a cancelled-context error.
daviddrysdale added a commit to daviddrysdale/trillian that referenced this pull request Apr 3, 2019
Seems like grpc/grpc-go#2733 has resulted
in behaviour where any attempt to gRPC dial with a timeout gives
a cancelled-context error.
daviddrysdale added a commit to daviddrysdale/trillian that referenced this pull request Apr 3, 2019
Seems like grpc/grpc-go#2733 has resulted
in behaviour where any attempt to gRPC dial with a timeout gives
a cancelled-context error.
daviddrysdale added a commit to google/trillian that referenced this pull request Apr 3, 2019
Seems like grpc/grpc-go#2733 has resulted
in behaviour where any attempt to gRPC dial with a timeout gives
a cancelled-context error.
daviddrysdale added a commit to daviddrysdale/grpc-go that referenced this pull request Apr 3, 2019
Commit 955eb8a ("channelz: cleanup channel registration if Dial
fails (grpc#2733)") moved a defer block earlier in DialContext() to
ensure that cc.Close() was always called.  This defer block also
checks whether the ctx.Done() is true, and if so ensures the context
error is returned.

If the dial options include a timeout, the original context gets
replaced with a new context that has the timeout, and this gets
a catchall `defer cancel()` to go with it.

However, this cancel() now gets called before the cleanup defer
block, so when the latter runs the context is always already cancelled.

Fix by splitting the larger defer block into two parts:
 - The part that does cc.Close() stays near the beginning of the
   method.
 - The part that checks ctx.Done() returns to below the `defer cancel()`
   call, and so gets invoked before it.
daviddrysdale added a commit to daviddrysdale/grpc-go that referenced this pull request Apr 3, 2019
Commit 955eb8a ("channelz: cleanup channel registration if Dial
fails (grpc#2733)") moved a defer block earlier in DialContext() to
ensure that cc.Close() was always called.  This defer block also
checks whether the ctx.Done() is true, and if so ensures the context
error is returned.

If the dial options include a timeout, the original context gets
replaced with a new context that has the timeout, and this gets
a catchall `defer cancel()` to go with it.

However, this cancel() now gets called before the cleanup defer
block, so when the latter runs the context is always already cancelled.

Fix by splitting the larger defer block into two parts:
 - The part that does cc.Close() stays near the beginning of the
   method.
 - The part that checks ctx.Done() returns to below the `defer cancel()`
   call, and so gets invoked before it.
daviddrysdale added a commit to daviddrysdale/grpc-go that referenced this pull request Apr 3, 2019
Commit 955eb8a ("channelz: cleanup channel registration if Dial
fails (grpc#2733)") moved a defer block earlier in DialContext() to
ensure that cc.Close() was always called.  This defer block also
checks whether the ctx.Done() is true, and if so ensures the context
error is returned.

If the dial options include a timeout, the original context gets
replaced with a new context that has the timeout, and this gets
a catchall `defer cancel()` to go with it.

However, this cancel() now gets called before the cleanup defer
block, so when the latter runs the context is always already cancelled.

Fix by splitting the larger defer block into two parts:
 - The part that does cc.Close() stays near the beginning of the
   method.
 - The part that checks ctx.Done() returns to below the `defer cancel()`
   call, and so gets invoked before it.
@dfawley dfawley added this to the 1.20 Release milestone Apr 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants