Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: fix error handling on Stream deletion #1275

Merged
merged 1 commit into from
Jun 14, 2017
Merged

transport: fix error handling on Stream deletion #1275

merged 1 commit into from
Jun 14, 2017

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Jun 1, 2017

This patch writes client-side error before closing the active stream
to fix blocking RecvMsg issue on grpc.ClientStream [1].

Previous gRPC client stream just exits on ClientTransport.Error [2].
And latest gRPC added another select case on client connection context
cancel [3]. Now when client stream closes from client connection context
cancel, it calls CloseStream with ErrClientConnClosing error. And then
the stream gets deleted from *http2Client.activeStreams, without processing
the error [4]. Then in-flight RecvMsg call on this client will block on
*parser.Reader.recvMsg [5].

In short,

  1. ClientConn.Close.
  2. in-flight streams will receive case <-cc.ctx.Done() https://github.com/grpc/grpc-go/blob/master/stream.go#L253-L255.
  3. cs.closeTransportStream(ErrClientConnClosing) calls cs.t.CloseStream(cs.s, err).
  4. CloseStream(cs.s, err) calls delete(t.activeStreams, s.id) without handling the error.
  5. in-flight streams will never receive error, left hanging.

I can reproduce in etcd tests with in-flight recvMsg calls to Observe RPC.


[1] etcd-io/etcd#7896 (comment)
[2] https://github.com/grpc/grpc-go/blob/v1.2.x/stream.go#L235-L238
[3] #1136
[4] https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go#L569
[5] https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L280

/cc @xiang90 @heyitsanthony

fixes #1293

@MakMukhi MakMukhi self-requested a review June 1, 2017 23:10
@MakMukhi
Copy link
Contributor

MakMukhi commented Jun 1, 2017

So I understand that this issue comes up when the code exits from this path: https://github.com/grpc/grpc-go/blob/master/stream.go#L253

Does this happen when the user application calls Close() on client conn and doesn't cancel the the stream's context?

@gyuho
Copy link
Contributor Author

gyuho commented Jun 1, 2017

Does this happen when the user application calls Close() on client conn and doesn't cancel the the stream's context?

Yes.

If I understand your question correctly, we get the server stream context first, and launch the Recv call on the stream [1]. And since the Recv call blocks, the cancel function never gets called, while we still close the client connection when terminating the cluster.

func (ep *electionProxy) Observe(req *v3electionpb.LeaderRequest, s v3electionpb.Election_ObserveServer) error {
	conn := ep.client.ActiveConnection()
	ctx, cancel := context.WithCancel(s.Context())
	_ = cancel
	sc, err := v3electionpb.NewElectionClient(conn).Observe(ctx, req)
	if err != nil {
		return err
	}
	for {
                // blocks here...
		rr, err := sc.Recv()
                ...

[1] https://github.com/coreos/etcd/blob/master/proxy/grpcproxy/election.go#L46-L53

@MakMukhi
Copy link
Contributor

MakMukhi commented Jun 1, 2017

I see. Thanks for link to the code. I want to discuss this with the team a bit (possible other ways to tackle the problem). Let's try and wrap this is up by tomorrow.

@MakMukhi
Copy link
Contributor

MakMukhi commented Jun 2, 2017

So this is in someway related to code changes made in the past. Looking at which your change does make sense. However, this also requires some more thought.

We all are in meetings all day today. This will have to go to Monday. Hope that's alright.

@MakMukhi
Copy link
Contributor

MakMukhi commented Jun 5, 2017

Hey,
So we agreed on the change that you're proposing. Can you also please add a test for it?

@gyuho gyuho changed the title transport: write error on Stream before deletion transport: fix error handling on Stream deletion Jun 6, 2017
if _, ok := err.(StreamError); ok {
// trigger server to send GoAway
s.rstStream = true
s.rstError = http2.ErrCodeCancel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't trigger the server to send GoAway. Moreover, what's the rationale behind triggering it? It is a connection level concept; closing of a stream with some error shouldn't trigger closing of a connection, right?

We liked your original change, any specific reasons for updating it?

serverConfig := &ServerConfig{
KeepaliveParams: keepalive.ServerParameters{
MaxConnectionIdle: 2 * time.Second,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rely on keepalive related goaways, we don't test the code you wrote s.write(recvMsg{err: err}) .
The test should be a simple reproduction of your prod issue; a goroutine is blocked on stream.Read() while clientconn.Close() is called, with your change in place this goroutine should unblock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let me rework on the test with original change. Thanks.

Copy link
Contributor

@MakMukhi MakMukhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick. LGTM otherwise.

serverConfig := &ServerConfig{
InitialWindowSize: defaultWindowSize,
InitialConnWindowSize: defaultWindowSize,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test, we don't really need to set custom window sizes.

@@ -382,6 +382,47 @@ func setUpWithNoPingServer(t *testing.T, copts ConnectOptions, done chan net.Con
return tr
}

// TestInflightStreamClosing ensures that closing in-flight stream
// sends StreamError to concurrent stream reader.
func TestInflightStreamClosing(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks good. I was expecting an end2end test with clientconn.Close() being called(like your prod issue) but this is essentially the same thing.
Can you make sure that this test fails without your changes in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fixed.

I can confirm this test fails without the patch.
Our production blocking issue is fixed by this as well.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for your time and effort @gyuho

This patch writes client-side error before closing the active stream
to fix blocking `RecvMsg` issue on `grpc.ClientStream` [1].

Previous gRPC client stream just exits on `ClientTransport.Error` [2].
And latest gRPC added another select case on client connection context
cancel [3]. Now when client stream closes from client connection context
cancel, it calls `CloseStream` with `ErrClientConnClosing` error. And then
the stream gets deleted from `*http2Client.activeStreams`, without processing
the error [4]. Then in-flight `RecvMsg` call on this client will block on
`*parser.Reader.recvMsg` [5].

In short,

1. `ClientConn.Close`.
2. in-flight streams will receive case `<-cc.ctx.Done()`
   https://github.com/grpc/grpc-go/blob/master/stream.go#L253-L255.
3. `cs.closeTransportStream(ErrClientConnClosing)` calls `cs.t.CloseStream(cs.s, err)`.
4. `CloseStream(cs.s, err)` calls `delete(t.activeStreams, s.id)`
   without handling the error.
5. in-flight streams will never receive error, left hanging.

I can reproduce in etcd tests with in-flight `recvMsg` calls to `Observe` RPC.

---
[1] etcd-io/etcd#7896 (comment)
[2] https://github.com/grpc/grpc-go/blob/v1.2.x/stream.go#L235-L238
[3] #1136
[4] https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go#L569
[5] https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L280

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@MakMukhi MakMukhi merged commit 7df48be into grpc:master Jun 14, 2017
@gyuho
Copy link
Contributor Author

gyuho commented Jun 14, 2017

@MakMukhi

Could we have another release with this patch?
Just cut from master branch would work for us.
We will wait if there are other pending PRs.

Thanks!

@gyuho gyuho deleted the write-recv-err-on-close-stream branch June 14, 2017 21:36
@MakMukhi
Copy link
Contributor

@menghanl What do you think ?

@menghanl
Copy link
Contributor

We can backport this PR to the 1.4 branch and do a 1.4.1 release.

menghanl pushed a commit to menghanl/grpc-go that referenced this pull request Jun 14, 2017
This patch writes client-side error before closing the active stream
to fix blocking `RecvMsg` issue on `grpc.ClientStream` [1].

Previous gRPC client stream just exits on `ClientTransport.Error` [2].
And latest gRPC added another select case on client connection context
cancel [3]. Now when client stream closes from client connection context
cancel, it calls `CloseStream` with `ErrClientConnClosing` error. And then
the stream gets deleted from `*http2Client.activeStreams`, without processing
the error [4]. Then in-flight `RecvMsg` call on this client will block on
`*parser.Reader.recvMsg` [5].

In short,

1. `ClientConn.Close`.
2. in-flight streams will receive case `<-cc.ctx.Done()`
   https://github.com/grpc/grpc-go/blob/master/stream.go#L253-L255.
3. `cs.closeTransportStream(ErrClientConnClosing)` calls `cs.t.CloseStream(cs.s, err)`.
4. `CloseStream(cs.s, err)` calls `delete(t.activeStreams, s.id)`
   without handling the error.
5. in-flight streams will never receive error, left hanging.

I can reproduce in etcd tests with in-flight `recvMsg` calls to `Observe` RPC.

---
[1] etcd-io/etcd#7896 (comment)
[2] https://github.com/grpc/grpc-go/blob/v1.2.x/stream.go#L235-L238
[3] grpc#1136
[4] https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go#L569
[5] https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L280

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@menghanl
Copy link
Contributor

menghanl commented Jun 14, 2017

@gyuho Just released 1.4.1 with this change.

@gyuho
Copy link
Contributor Author

gyuho commented Jun 15, 2017

@menghanl Thanks!

@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClientConn.Close() doesn't always make Recv() end
3 participants