Skip to content

Client connection phase should optionally wait for SETTINGS frame and set deadlines #1444

@gyuho

Description

@gyuho

What version of gRPC are you using?

Master branch as of today (bfaf042).

What version of Go are you using (go version)?

go version go1.9rc1 darwin/amd64

What operating system (Linux, Windows, …) and version?

MacOS

What did you do?

c.f. etcd-io/etcd#8258

What did you expect to see?

We want to use keepalive for HTTP/2 ping health checking. We expect endpoint switch when one endpoint times out on keepalive.

What did you see instead?

keepalive time-out triggers address connection state update to transient failure, and resetTransport retries this endpoint: balancer keeps calling Up on the timed-out endpoint. If the endpoint never comes back, balancer gets stuck with retrying.

Is there any other way to stop those retries on timed-out endpoint, and try others? We have our own balancer interface implementation, but the keepalive time-out error is not distinguishable in client side, so not much we can do.

Here's the code path for reference:

  1. Configure grpc.Balancer(ep1,ep2) with keepalive 1-second
  2. Blackhole(ep1)
  3. keepalive(ep1) times out in 1-second, which is expected
  4. grpc-go/transport/http2_client.go/*http2Client calls (*http2Client).Close on ep1
    • ep1 has transportState reachable at the moment
    • close(t.errorChan)
  5. Signal <-t.Error() on grpc-go/clientconn.go/(*addrConn).transportMonitor()
    • ep1 *addrConn.(connectivity.State) is connectivity.Ready
    • ep1 *addrConn.(connectivity.State) is set to connectivity.TransientFailure
  6. resetTransport(drain=false) on ep1
    • Calls ep1's down with grpc: failed with network I/O error
  7. resetTransport(drain=false) retries on ep1 unless *addrConn.(connectivity.State) != connectivity.Shutdown
    • for retries := 0; ; retries++ {
  8. Still ep1's *addrConn.(connectivity.State) == connectivity.TransientFailure
  9. Thus, the retrial loop will keep calling ac.cc.dopts.balancer.Up(ep1)
  10. Now, be stuck with blackholed ep1

Thanks.

Metadata

Metadata

Assignees

Labels

P1Type: FeatureNew features or improvements in behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions