Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poorly-behaved readers can block a broker's GracefulStop #266

Closed
jgraettinger opened this issue Jun 10, 2020 · 3 comments
Closed

poorly-behaved readers can block a broker's GracefulStop #266

jgraettinger opened this issue Jun 10, 2020 · 3 comments
Labels

Comments

@jgraettinger
Copy link
Contributor

jgraettinger commented Jun 10, 2020

Grabbed this off of a production broker which was taking quite a while to exit:

goroutine 894016483 [select, 223 minutes]:
google.golang.org/grpc/internal/transport.(*writeQuota).get(0xc011a62920, 0xc000020011, 0x3ffb, 0x5)
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/internal/transport/flowcontrol.go:59 +0xaa
google.golang.org/grpc/internal/transport.(*http2Server).Write(0xc00bd34180, 0xc012b79c00, 0xc01f02e000, 0x5, 0x4000, 0xc006b36000, 0x2000c, 0x2000c, 0xc00a49c9fd, 0x5, ...)
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/internal/transport/http2_server.go:927 +0x1ec
google.golang.org/grpc.(*serverStream).SendMsg(0xc011d660c0, 0x11c6580, 0xc006a45080, 0x0, 0x0)
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/stream.go:1420 +0x23c
github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc0114d87e0, 0x11c6580, 0xc006a45080, 0x2, 0x2)
        /gazette/.build/go-path/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:156 +0x4b
go.gazette.dev/core/broker.proxyRead(0x7f31ced3a5e0, 0xc001128080, 0xc0152b8d80, 0x14bd8e0, 0xc00020a3b0, 0xc0002ac300, 0x0, 0x0)
        /gazette/broker/read_api.go:113 +0x437
go.gazette.dev/core/broker.(*Service).Read(0xc0002b0090, 0xc0152b8d80, 0x14bff80, 0xc001128080, 0xc0002b0090, 0x20)
        /gazette/broker/read_api.go:56 +0x3b2
go.gazette.dev/core/broker/protocol._Journal_Read_Handler(0x11891c0, 0xc0002b0090, 0x14bd640, 0xc0114d87e0, 0x16, 0xc012ecf5e0)
        /gazette/broker/protocol/protocol.pb.go:3084 +0x109
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1(0x11891c0, 0xc0002b0090, 0x14bda60, 0xc011d660c0, 0xc0114d87c0, 0x1266c60, 0x14b6720, 0xc0152
b8d20)
        /gazette/.build/go-path/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121 +0xeb
google.golang.org/grpc.(*Server).processStreamingRPC(0xc000542680, 0x14c26e0, 0xc00bd34180, 0xc012b79c00, 0xc00054d260, 0x1e28000, 0xc00ecbb640, 0x0, 0x0)
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/server.go:1336 +0x511
google.golang.org/grpc.(*Server).handleStream(0xc000542680, 0x14c26e0, 0xc00bd34180, 0xc012b79c00, 0xc00ecbb640)
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/server.go:1409 +0xc62
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc008794840, 0xc000542680, 0x14c26e0, 0xc00bd34180, 0xc012b79c00)
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/server.go:746 +0xa1
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /gazette/.build/go-path/pkg/mod/google.golang.org/grpc@v1.28.0/server.go:744 +0xa1

The server sets keep-alive on accepted connections, so as far as I can tell this was a live client which had started a read RPC, but was then making no progress reading the response stream from the broker. The broker wanted to exit, but the gRPC server's GracefulStop was being held up by the client's Read RPC already underway. The proxy Read RPC knows to check for shutdown and send an EOF to the client, but the current mechanism doesn't take into account that SendMsg can block indefinitely.

This may be a concern with non-proxied reads as well, but I was only able to find traces involving proxied reads for some reason.

@jgraettinger
Copy link
Contributor Author

To resolve this, I'm adding a timeout (15s) of how long we'll wait for GracefulStop to complete. After that point, a hard gRPC server Stop is issued. This has the benefit of causing offending RPCs to bail out, logging errors like:

time="2020-06-10T15:47:34-05:00" level=warning msg="served Read RPC failed" client="127.0.0.1:46058" err="rpc error: code = Unavailable desc = transport is closing" req="journal:"foo/bar" offset:262144 block:true "

which surfaces that they're happening and gives clues to the applicable clients.

@jgraettinger
Copy link
Contributor Author

resolved in commit 2678807

@jgraettinger
Copy link
Contributor Author

Related issue in a dependency, which is not directly resolved by this ^ change: soheilhy/cmux#76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant