-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
badly-behaved client can deadlock server stopping #76
Comments
An effective mitigation of this behavior is to set a non-zero |
Setting a read timeout mitigates the impact of soheilhy/cmux#76 Close GRPCLoopback on hard stop to guard against the possibility of a remote peer which responds to TCP keep-alive but isn't reading from the connection.
Would it be reasonable to add a |
Adding the |
Documenting my findings debugging a production issue:
tl;dr is that a client can mess with stopping of a server, because the sniffing mechanism has no notion of draining for connections that have yet to be matched to a sub-listener. The specific scenario I encountered is:
Net effect is that grpc.Server.Stop/GracefulStop() & cmux.Serve() can't return until the client connection is remotely closed.
Not entirely sure what the right behavior here is. My gut take is that cmux Accept() should preserve the exit semantics of the wrapped listener Accept, and return its error even though there our outstanding, still-to-be-sniffed connections.
Collected traces:
crux.Serve has found that the wrapped listener Accept has error’d.
It’s trying to return, but is blocked on it's own WG within a defer:
That WG can't finish because a connection thread is stuck waiting to sniff an HTTP/2 header:
Meanwhile, gRPC Serve() is blocked waiting for Accept to return. It must do so before it can notify the gRPC server’s own WG, which is a prerequisite for GracefulStop or Stop to return:
For completeness, here's where GracefulStop is wedged waiting on it's WG, held hostage by grpc.Serve:
The text was updated successfully, but these errors were encountered: