-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Labels
Description
What version of gRPC are you using?
Master branch as of today (bfaf042).
What version of Go are you using (go version
)?
go version go1.9rc1 darwin/amd64
What operating system (Linux, Windows, …) and version?
MacOS
What did you do?
c.f. etcd-io/etcd#8258
What did you expect to see?
We want to use keepalive
for HTTP/2 ping health checking. We expect endpoint switch when one endpoint times out on keepalive
.
What did you see instead?
keepalive
time-out triggers address connection state update to transient failure, and resetTransport
retries this endpoint: balancer keeps calling Up
on the timed-out endpoint. If the endpoint never comes back, balancer gets stuck with retrying.
Is there any other way to stop those retries on timed-out endpoint, and try others? We have our own balancer interface implementation, but the keepalive
time-out error is not distinguishable in client side, so not much we can do.
Here's the code path for reference:
- Configure
grpc.Balancer(ep1,ep2)
withkeepalive
1-second Blackhole(ep1)
keepalive(ep1)
times out in 1-second, which is expectedgrpc-go/transport/http2_client.go/*http2Client
calls(*http2Client).Close
onep1
ep1
hastransportState
reachable
at the momentclose(t.errorChan)
- Signal
<-t.Error()
ongrpc-go/clientconn.go/(*addrConn).transportMonitor()
ep1
*addrConn.(connectivity.State)
isconnectivity.Ready
ep1
*addrConn.(connectivity.State)
is set toconnectivity.TransientFailure
resetTransport(drain=false)
onep1
- Calls
ep1
'sdown
withgrpc: failed with network I/O error
- Calls
resetTransport(drain=false)
retries onep1
unless*addrConn.(connectivity.State) != connectivity.Shutdown
for retries := 0; ; retries++ {
- Still
ep1
's*addrConn.(connectivity.State) == connectivity.TransientFailure
- Thus, the retrial loop will keep calling
ac.cc.dopts.balancer.Up(ep1)
- Now, be stuck with blackholed
ep1
Thanks.