Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd3 fails to shut down on sigint/sigterm when grpc-gateway has active watches #8224

Closed
jamesyonan opened this issue Jul 6, 2017 · 4 comments

Comments

@jamesyonan
Copy link

Running a single-node cluster from the shell using 3.2.0-rc.1+git. I have a local client connect to the JSON grpc-gateway using POST /v3alpha/watch in "streaming" mode, where the HTTP session is long-term and each new watch event comes over the wire as a new JSON-encoded HTTP chunk. This part works fine. However if I try to ctrl-c the etcd process while the watch is still active, the process hangs in a loop with the below errors. The only way to get it to terminate is either sigkill or terminating the watch client.

^C2017-07-06 17:02:31.710640 N | pkg/osutil: received interrupt signal, shutting down... 2017-07-06 17:02:31.710981 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: getsockopt: connection refused"; Reconnecting to {127.0.0.1:2379 <nil>} 2017-07-06 17:02:32.711047 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: getsockopt: connection refused"; Reconnecting to {127.0.0.1:2379 <nil>}

@gyuho
Copy link
Contributor

gyuho commented Jul 13, 2017

Yes, this is expected since we use https://godoc.org/google.golang.org/grpc#Server.GracefulStop. The server will block on GracefulStop with active watchers https://github.com/coreos/etcd/blob/master/embed/etcd.go#L196.

Same happens with clientv3

e.g.

wch := cli.Watch(context.Background(), "foo")
<-wch

/cc @heyitsanthony @xiang90

@heyitsanthony
Copy link
Contributor

@gyuho Is there some way the server can stop the client listener and cut the connections instead of waiting for all clients to disconnect on their own? If clients can keep the server from shutting down indefinitely and there's no way to force the clients to change endpoints, then upgrades would have to kill -9 the etcd process (or jump through other hoops to move the clients) to make forward progress, which seems bad.

@gyuho
Copy link
Contributor

gyuho commented Jul 13, 2017

@heyitsanthony You are right. I will look into that.

@jamesyonan
Copy link
Author

I agree that it's generally a bad idea for servers to fail to respond to a shutdown command just because clients are connected. There's a security component to this as well, as @heyitsanthony mentioned with updates. If a client can prevent a server from shutting down, just by staying connected to it, it could effectively stop a security update.

James

gyuho added a commit to gyuho/etcd that referenced this issue Jul 14, 2017
Both grpc.Server.Stop and grpc.Server.GracefulStop close the listeners
first, to stop accepting the new connections. GracefulStop blocks until
all clients close their open transports(connections), when unary RPCs
only take a few seconds to finish. Stream RPCs like watch might never
close the connections in client side, thus making gRPC server wait
forever.

This patch still calls GracefulStop, but waits up to 10s before manually
closing the open transports.

Address etcd-io#8224.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this issue Jul 14, 2017
Both grpc.Server.Stop and grpc.Server.GracefulStop close the listeners
first, to stop accepting the new connections. GracefulStop blocks until
all clients close their open transports(connections). Unary RPCs
only take a few seconds to finish. Stream RPCs, like watch, might never
close the connections from client side, thus making gRPC server wait
forever.

This patch still calls GracefulStop, but waits up to 10s before manually
closing the open transports.

Address etcd-io#8224.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this issue Jul 14, 2017
Both grpc.Server.Stop and grpc.Server.GracefulStop close the listeners
first, to stop accepting the new connections. GracefulStop blocks until
all clients close their open transports(connections). Unary RPCs
only take a few seconds to finish. Stream RPCs, like watch, might never
close the connections from client side, thus making gRPC server wait
forever.

This patch still calls GracefulStop, but waits up to 10s before manually
closing the open transports.

Address etcd-io#8224.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this issue Jul 14, 2017
Both grpc.Server.Stop and grpc.Server.GracefulStop close the listeners
first, to stop accepting the new connections. GracefulStop blocks until
all clients close their open transports(connections). Unary RPCs
only take a few seconds to finish. Stream RPCs, like watch, might never
close the connections from client side, thus making gRPC server wait
forever.

This patch still calls GracefulStop, but waits up to 10s before manually
closing the open transports.

Address etcd-io#8224.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this issue Jul 14, 2017
Both grpc.Server.Stop and grpc.Server.GracefulStop close the listeners
first, to stop accepting the new connections. GracefulStop blocks until
all clients close their open transports(connections). Unary RPCs
only take a few seconds to finish. Stream RPCs, like watch, might never
close the connections from client side, thus making gRPC server wait
forever.

This patch still calls GracefulStop, but waits up to 10s before manually
closing the open transports.

Address etcd-io#8224.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit that referenced this issue Jul 15, 2017
Both grpc.Server.Stop and grpc.Server.GracefulStop close the listeners
first, to stop accepting the new connections. GracefulStop blocks until
all clients close their open transports(connections). Unary RPCs
only take a few seconds to finish. Stream RPCs, like watch, might never
close the connections from client side, thus making gRPC server wait
forever.

This patch still calls GracefulStop, but waits up to 10s before manually
closing the open transports.

Address #8224.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@gyuho gyuho closed this as completed Jul 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants