Client connection phase should optionally wait for SETTINGS frame and set deadlines

### What version of gRPC are you using?

Master branch as of today (bfaf0423469fc4f95e9eac7e3eec0b2abb46fcca).

### What version of Go are you using (`go version`)?

```
go version go1.9rc1 darwin/amd64
```

### What operating system (Linux, Windows, …) and version?

MacOS

### What did you do?

c.f. https://github.com/coreos/etcd/pull/8258

### What did you expect to see?

We want to use `keepalive` for HTTP/2 ping health checking. We expect endpoint switch when one endpoint times out on `keepalive`.

### What did you see instead?

`keepalive` time-out triggers address connection state update to transient failure, and `resetTransport` retries this endpoint: balancer keeps calling `Up` on the timed-out endpoint. If the endpoint never comes back, balancer gets stuck with retrying.

Is there any other way to stop those retries on timed-out endpoint, and try others? We have our own balancer interface implementation, but the `keepalive` time-out error is not distinguishable in client side, so not much we can do.

Here's the code path for reference:

1. Configure `grpc.Balancer(ep1,ep2)` with `keepalive` 1-second
2. `Blackhole(ep1)`
3. `keepalive(ep1)` **times out** in 1-second, which is expected
4.  [`grpc-go/transport/http2_client.go/*http2Client`](https://github.com/grpc/grpc-go/blob/master/transport/http2_client.go) calls `(*http2Client).Close` on `ep1`
    - `ep1` has `transportState` `reachable` at the moment
    - `close(t.errorChan)`
5. Signal `<-t.Error()` on [`grpc-go/clientconn.go/(*addrConn).transportMonitor()`](https://github.com/grpc/grpc-go/blob/master/clientconn.go)
    - `ep1` `*addrConn.(connectivity.State)` is `connectivity.Ready`
    - `ep1` `*addrConn.(connectivity.State)` is set to **`connectivity.TransientFailure`**
6. `resetTransport(drain=false)` on `ep1`
    - Calls `ep1`'s `down` with `grpc: failed with network I/O error`
7. `resetTransport(drain=false)` retries on `ep1` unless `*addrConn.(connectivity.State) != connectivity.Shutdown`
    - `for retries := 0; ; retries++ {`
8. Still `ep1`'s **`*addrConn.(connectivity.State) == connectivity.TransientFailure`**
9. Thus, the retrial loop will keep calling `ac.cc.dopts.balancer.Up(ep1)`
10. Now, be stuck with blackholed `ep1`

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Client connection phase should optionally wait for SETTINGS frame and set deadlines #1444

What version of gRPC are you using?

What version of Go are you using (`go version`)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Client connection phase should optionally wait for SETTINGS frame and set deadlines #1444

Description

What version of gRPC are you using?

What version of Go are you using (go version)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?