Skip to content

[gRPC/http2 auto-detect] Flakiness and potential connection leak #10962

@markusthoemmes

Description

@markusthoemmes

In what area(s)?

/area networking

TestHTTPProbeAutoHTTP2 is quite flaky currently and I almost went crazy trying to debug why. The test failures look as if two sequential HTTP requests race. Turns out: They do! Adding a 100ms sleep to the beginning of the handler makes the issue very reproducible.

So what happens (I believe) is that the upgrade probe request gets handled by the h2c handler and is then "stuck" in the upgrade functionality. The request will be dispatched to the handler asynchronously whereas the HTTP2 probe has already returned with a 101 Switch Protocols.

I think this also means we're leaking connections here as our HTTP2 probe actually establishes an HTTP2 connection. The returned response is not indicative of that being closed. I experimented with aborting the connection by cancelling a context on the initial request (didn't work) and by sending actual HTTP2 frames via the connection (didn't work yet either).

To properly fix this, we should make sure that the http2UpgradeProbe function returns in lockstep with the first request actually being done (and the HTTP2 connection closed). Only then can we guarantee the test to work as intended (though I'm not super sure it's "correct" currently) and guarantee that we're not leaking connections (which is my bigger concern currently).

/assign @rafaeltello

Metadata

Metadata

Assignees

Labels

area/networkingkind/bugCategorizes issue or PR as related to a bug.triage/acceptedIssues which should be fixed (post-triage)

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions