-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/http2: http2: no cached connection was available #22091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @tombergan Tom, does this sound familiar? My memory's still fuzzy from my leave. I see #16582 is marked fixed but later comments (#16582 (comment)) suggest it's still happening on Go 1.9.x. |
Do you have a repro you can share? I cannot repro with https://play.golang.org/p/zKthX7Y9RW (from #16582).
Perhaps I misunderstood... noDialH2RoundTripper cannot return ErrNoCachedConn. It translates that error to ErrSkipAltProtocol. Did you mean s/ErrNoCachedConn/ErrSkipAltProtocol/ ? According to the comments, there is a known possible race with CloseIdleConnections (https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221). @WillSewell and @sjenning, can you verify that you're not calling CloseIdleConnections? |
We are not calling FYI, we worked around this in kube by using seperate transports for the two probe types such that a timing collision doesn't occur kubernetes/kubernetes#53318 |
I wonder if this happens because the transport uses DisableKeepAlives=true? Looks like yes. I cannot repro with DisableKeepAlives=false, but I can repro with DisableKeepAlives=true. For the record: My guess: the second request selects the same connection as the first request, but its RoundTrip doesn't execute until after the first request completes. By that point, the connection has gone idle and shut down because keep alives are disabled. This could be a manifestation of https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L221 // TODO: don't close a cc if it was just added to the pool
// milliseconds ago and has never been used. There's currently
// a small race window with the HTTP/1 Transport's integration
// where it can add an idle conn just before using it, and
// somebody else can concurrently call CloseIdleConns and @sjenning, do you know why you're disabling keep alives? (Wild guess: that flag was added with HTTP/1.1 in mind. Since these are infrequent probes, there's no reason to keep the connection open, so keep alives were disabled.) @WillSewell, are you also also disabling keep alives? |
DisableKeepAlives=true was result of kubernetes/kubernetes#15733 which fixed kubernetes/kubernetes#15643 |
I also can't reproduce it when running your example. I'm afraid I don't think I have time right now to boil it down to a minimal test case (I just tried and failed).
I'm not.
I'm not, although I do get the error when running your example with keep alives disabled. |
If import "golang.org/x/net/http2" directly and use its ConfigureTransport function, the fix [7a62274] will not work. Just like https://play.golang.org/p/8-KRho9KPH The err is http2.ErrNoCachedConn instead of http2ErrNoCachedConn. func (pc *persistConn) shouldRetryRequest(req *Request, err error) bool {
if err == http2ErrNoCachedConn {
// Issue 16582: if the user started a bunch of
// requests at once, they can all pick the same conn
// and violate the server's max concurrent streams.
// Instead, match the HTTP/1 behavior for now and dial
// again to get a new TCP connection, rather than failing
// this request.
return true
}
...
} |
@tombergan, I'll take this. |
Change https://golang.org/cl/87297 mentions this issue: |
Change https://golang.org/cl/87298 mentions this issue: |
@someonegg, thanks! |
…hedConn In a given program there may be two separate copies of ErrNoCachedConn: the h2_bundle.go version in net/http, and the user's golang.org/x/net/http2 version. We need to be able to detect either in net/http. This CL adds a function to report whether an error value represents that type of error, and then a subsequent CL to net/http will use it instead of ==. Updates golang/go#22091 Change-Id: I86f1e20704eee29b8980707b700d7a290107dfd4 Reviewed-on: https://go-review.googlesource.com/87297 Reviewed-by: Tom Bergan <tombergan@google.com>
For my interest, in which go version will this be fixed? How could I find out by myself? |
…hedConn In a given program there may be two separate copies of ErrNoCachedConn: the h2_bundle.go version in net/http, and the user's golang.org/x/net/http2 version. We need to be able to detect either in net/http. This CL adds a function to report whether an error value represents that type of error, and then a subsequent CL to net/http will use it instead of ==. Updates golang/go#22091 Change-Id: I86f1e20704eee29b8980707b700d7a290107dfd4 Reviewed-on: https://go-review.googlesource.com/87297 Reviewed-by: Tom Bergan <tombergan@google.com>
What version of Go are you using (
go version
)?go version go1.8.3 linux/amd64
Does this issue reproduce with the latest release?
Unknown, checking now
What operating system and processor architecture are you using (
go env
)?What did you do?
In Kubernetes 1.8, we are seeing $SUBJECT error message on pods that have both a readiness and liveness probe. In this case, the probes are nearly simultaneous (within microseconds). About 2-3 times per hour, there is a timing collision resulting in the failure of one of the two probes with
http2: no cached connection was available
I confirmed the disabling http2 for the probe connections causes the issue not to occur.
I also confirmed that disabling one of the probes causes the issue not to occur.
This is not load related as the transport is only used for the probes and there are 2 probes every 5 seconds. It is a timing issue.
What did you expect to see?
The connection succeeds every time
What did you see instead?
http2: no cached connection was available
Additional References
#16582
kubernetes/kubernetes#49740
I'll try to write a tight recreator for this since it is difficult and time consuming to recreate on Kubernetes.
@smarterclayton @derekwaynecarr
The text was updated successfully, but these errors were encountered: