-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transport: fix error handling on Stream deletion #1275
Conversation
So I understand that this issue comes up when the code exits from this path: https://github.com/grpc/grpc-go/blob/master/stream.go#L253 Does this happen when the user application calls Close() on client conn and doesn't cancel the the stream's context? |
Yes. If I understand your question correctly, we get the server stream context first, and launch the func (ep *electionProxy) Observe(req *v3electionpb.LeaderRequest, s v3electionpb.Election_ObserveServer) error {
conn := ep.client.ActiveConnection()
ctx, cancel := context.WithCancel(s.Context())
_ = cancel
sc, err := v3electionpb.NewElectionClient(conn).Observe(ctx, req)
if err != nil {
return err
}
for {
// blocks here...
rr, err := sc.Recv()
... [1] https://github.com/coreos/etcd/blob/master/proxy/grpcproxy/election.go#L46-L53 |
I see. Thanks for link to the code. I want to discuss this with the team a bit (possible other ways to tackle the problem). Let's try and wrap this is up by tomorrow. |
So this is in someway related to code changes made in the past. Looking at which your change does make sense. However, this also requires some more thought. We all are in meetings all day today. This will have to go to Monday. Hope that's alright. |
Hey, |
transport/http2_client.go
Outdated
if _, ok := err.(StreamError); ok { | ||
// trigger server to send GoAway | ||
s.rstStream = true | ||
s.rstError = http2.ErrCodeCancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't trigger the server to send GoAway. Moreover, what's the rationale behind triggering it? It is a connection level concept; closing of a stream with some error shouldn't trigger closing of a connection, right?
We liked your original change, any specific reasons for updating it?
transport/transport_test.go
Outdated
serverConfig := &ServerConfig{ | ||
KeepaliveParams: keepalive.ServerParameters{ | ||
MaxConnectionIdle: 2 * time.Second, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rely on keepalive related goaways, we don't test the code you wrote s.write(recvMsg{err: err})
.
The test should be a simple reproduction of your prod issue; a goroutine is blocked on stream.Read() while clientconn.Close() is called, with your change in place this goroutine should unblock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let me rework on the test with original change. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick. LGTM otherwise.
transport/transport_test.go
Outdated
serverConfig := &ServerConfig{ | ||
InitialWindowSize: defaultWindowSize, | ||
InitialConnWindowSize: defaultWindowSize, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test, we don't really need to set custom window sizes.
@@ -382,6 +382,47 @@ func setUpWithNoPingServer(t *testing.T, copts ConnectOptions, done chan net.Con | |||
return tr | |||
} | |||
|
|||
// TestInflightStreamClosing ensures that closing in-flight stream | |||
// sends StreamError to concurrent stream reader. | |||
func TestInflightStreamClosing(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test looks good. I was expecting an end2end test with clientconn.Close() being called(like your prod issue) but this is essentially the same thing.
Can you make sure that this test fails without your changes in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fixed.
I can confirm this test fails without the patch.
Our production blocking issue is fixed by this as well.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for your time and effort @gyuho
This patch writes client-side error before closing the active stream to fix blocking `RecvMsg` issue on `grpc.ClientStream` [1]. Previous gRPC client stream just exits on `ClientTransport.Error` [2]. And latest gRPC added another select case on client connection context cancel [3]. Now when client stream closes from client connection context cancel, it calls `CloseStream` with `ErrClientConnClosing` error. And then the stream gets deleted from `*http2Client.activeStreams`, without processing the error [4]. Then in-flight `RecvMsg` call on this client will block on `*parser.Reader.recvMsg` [5]. In short, 1. `ClientConn.Close`. 2. in-flight streams will receive case `<-cc.ctx.Done()` https://github.com/grpc/grpc-go/blob/master/stream.go#L253-L255. 3. `cs.closeTransportStream(ErrClientConnClosing)` calls `cs.t.CloseStream(cs.s, err)`. 4. `CloseStream(cs.s, err)` calls `delete(t.activeStreams, s.id)` without handling the error. 5. in-flight streams will never receive error, left hanging. I can reproduce in etcd tests with in-flight `recvMsg` calls to `Observe` RPC. --- [1] etcd-io/etcd#7896 (comment) [2] https://github.com/grpc/grpc-go/blob/v1.2.x/stream.go#L235-L238 [3] #1136 [4] https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go#L569 [5] https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L280 Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Could we have another release with this patch? Thanks! |
@menghanl What do you think ? |
We can backport this PR to the 1.4 branch and do a 1.4.1 release. |
This patch writes client-side error before closing the active stream to fix blocking `RecvMsg` issue on `grpc.ClientStream` [1]. Previous gRPC client stream just exits on `ClientTransport.Error` [2]. And latest gRPC added another select case on client connection context cancel [3]. Now when client stream closes from client connection context cancel, it calls `CloseStream` with `ErrClientConnClosing` error. And then the stream gets deleted from `*http2Client.activeStreams`, without processing the error [4]. Then in-flight `RecvMsg` call on this client will block on `*parser.Reader.recvMsg` [5]. In short, 1. `ClientConn.Close`. 2. in-flight streams will receive case `<-cc.ctx.Done()` https://github.com/grpc/grpc-go/blob/master/stream.go#L253-L255. 3. `cs.closeTransportStream(ErrClientConnClosing)` calls `cs.t.CloseStream(cs.s, err)`. 4. `CloseStream(cs.s, err)` calls `delete(t.activeStreams, s.id)` without handling the error. 5. in-flight streams will never receive error, left hanging. I can reproduce in etcd tests with in-flight `recvMsg` calls to `Observe` RPC. --- [1] etcd-io/etcd#7896 (comment) [2] https://github.com/grpc/grpc-go/blob/v1.2.x/stream.go#L235-L238 [3] grpc#1136 [4] https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go#L569 [5] https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L280 Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@gyuho Just released 1.4.1 with this change. |
@menghanl Thanks! |
This patch writes client-side error before closing the active stream
to fix blocking
RecvMsg
issue ongrpc.ClientStream
[1].Previous gRPC client stream just exits on
ClientTransport.Error
[2].And latest gRPC added another select case on client connection context
cancel [3]. Now when client stream closes from client connection context
cancel, it calls
CloseStream
withErrClientConnClosing
error. And thenthe stream gets deleted from
*http2Client.activeStreams
, without processingthe error [4]. Then in-flight
RecvMsg
call on this client will block on*parser.Reader.recvMsg
[5].In short,
ClientConn.Close
.<-cc.ctx.Done()
https://github.com/grpc/grpc-go/blob/master/stream.go#L253-L255.cs.closeTransportStream(ErrClientConnClosing)
callscs.t.CloseStream(cs.s, err)
.CloseStream(cs.s, err)
callsdelete(t.activeStreams, s.id)
without handling the error.I can reproduce in etcd tests with in-flight
recvMsg
calls toObserve
RPC.[1] etcd-io/etcd#7896 (comment)
[2] https://github.com/grpc/grpc-go/blob/v1.2.x/stream.go#L235-L238
[3] #1136
[4] https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go#L569
[5] https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L280
/cc @xiang90 @heyitsanthony
fixes #1293