Skip to content

Conversation

@linkvt
Copy link
Contributor

@linkvt linkvt commented Oct 31, 2025

Fixes #15432
Fixes #10962

Release Note

Fall back to HTTP1 on failed HTTP2 health probes (e.g. on connection error or non-readiness)

Summary

Improves queue-proxy HTTP/2 probing reliability by switching from upgrade-based detection to direct H2C probes with fallback to HTTP/1.1.

Changes

  • HTTP/2 probe fallback logic: Replaces the deprecated HTTP upgrade mechanism (OPTIONS with Connection: Upgrade, HTTP2-Settings) with direct H2C GET requests. Falls back to HTTP/1.1 if H2C probe fails or returns non-ready status
  • Simplified transport handling: Removes version-spoofing transport wrapper in favor of protocol hints via req.ProtoMajor
  • Test updates: Rewrites hellohttp2 test service to use standard library HTTP/2 server

Additional Change (Could be Separate PR)

Also includes support for overriding the queue-proxy image via the queue.sidecar.serving.knative.dev/image annotation on KService specs.
I found this very useful during my tests, as...

  • I could ko apply a single file
  • get the update of both service and queue-pro in one revision
  • could compare the behavior of different kservices with different queue-proxy images (relevant for the next section)

Replacement of golang.org/x/net/http2 with stdlib

I looked into the replacement of golang.org/x/net/http2 (as h2c and http2 support exists in stdlib since 1.24) and golang.org/x/net/http2/h2c (proposal for deprecation exists) in pkg but didn't include it in this PR as it was unexpectedly a huge topic.
Switching to stdlib is not as easy, as the http2.Client sets up the http2 connection in a non standard way sending an HTTP2 Preface despite using the TLS connection:

A client that knows that a server supports HTTP/2 can establish a TCP connection and send the connection preface (Section 3.4) followed by HTTP/2 frames. Servers can identify these connections by the presence of the connection preface. This only affects the establishment of HTTP/2 connections over cleartext TCP; HTTP/2 connections over TLS MUST use protocol negotiation in TLS [TLS-ALPN].

This means that during a knative upgrade pod updates of queue-proxy before the activator would cause issues, as activator would send the preface the go stdlib http2 implementation in queue-proxy does not handle. We might be able to ignore such requests but I didn't test it yet.

The second task would be to setup H2 connections the standard way: via TLS ALPN.
But: how do we know in queue-proxy and the activator whether we actually want to use HTTP2?
The queue-proxy has currently no knowledge (besides it using the HTTP2 port 8013) and could rely on the probe to the user-container to figure this out and only afterwards accept HTTP2 connections.
We could derive this info during the Revision reconciliation but that would oppose removing the port naming restriction (see #4283).

The activator could potentially add the h2 protocol in the transport so that proxy connections using it would attempt h2 .
But how does queue-proxy behave? Always accept h2 (defined in the TLS Config of the server) without knowing whether the service supports h2?

Then there is also h2c (http2 cleartext) - how are things negotiated in this case?

I have more questions than answers right now but fortunately a solution of that isn't required to fix the issue referenced above.

Thanks for the feedback!

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 31, 2025
@knative-prow knative-prow bot requested review from dprotaso and skonto October 31, 2025 08:50
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.21%. Comparing base (c4ea5a7) to head (f0e58a5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/queue/health/probe.go 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16205      +/-   ##
==========================================
+ Coverage   80.05%   80.21%   +0.15%     
==========================================
  Files         216      216              
  Lines       13439    13422      -17     
==========================================
+ Hits        10759    10766       +7     
+ Misses       2318     2294      -24     
  Partials      362      362              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 386f8c6 to 9e6b6fa Compare October 31, 2025 09:55
@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch 2 times, most recently from 0c5f9df to c52f12c Compare October 31, 2025 10:18
@dprotaso
Copy link
Member

I think we should remove the queue proxy image annotation change.

I don't think that's something that we want users being able to manipulate given that's an operator concern.

@dprotaso
Copy link
Member

This means that during a knative upgrade pod updates of queue-proxy before the activator would cause issues, as activator would send the preface the go stdlib http2 implementation in queue-proxy does not handle. We might be able to ignore such requests but I didn't test it yet.

Is there a way to do this over multiple releases?

@dprotaso
Copy link
Member

dprotaso commented Nov 17, 2025

I think we should scope this to just the queue proxy => user container as per this issue #15432

and for now ignore activator => queue proxy interaction given that we control both ends of that hop. Thus ignore issue #10962

@linkvt
Copy link
Contributor Author

linkvt commented Nov 18, 2025

I think we should remove the queue proxy image annotation change.

I don't think that's something that we want users being able to manipulate given that's an operator concern.

I understand the reasoning but don't think that we're currently that consistent about that.
We're currently able to set e.g. the ingress class or the certificate class through an annotation.

Besides that it's quite hard to debug issues in queue-proxy as we directly affect all KServices, also considering the following recent comment of you where a service-local annotation would allow minimally invasive debugging: #16043 (comment) . This would maybe simplify debugging similar issues in production environments without affecting any other production workload.

Let me know what you prefer, discard, keep, separate PR, ...

This means that during a knative upgrade pod updates of queue-proxy before the activator would cause issues, as activator would send the preface the go stdlib http2 implementation in queue-proxy does not handle. We might be able to ignore such requests but I didn't test it yet.

Is there a way to do this over multiple releases?

I think so but at first I would need to make it work at all with the std library implementation. Let me know if we want to work on this, I can create an issue for a PoC to migrate to the stdlib implementation in the activator and the queue-proxy.

I think we should scope this to just the queue proxy => user container as per this issue #15432

and for now ignore activator => queue proxy interaction given that we control both ends of that hop. Thus ignore issue #10962

I don't think that we should continue sending these old HTTP2 Upgrade headers as they are already deprecated since more than 3 years as per RFC9113. As the new code is using a compliant way to establish an h2c connection the Upgrade stuff is not needed anymore which means that we also don't have to fix #10962 anymore.
You mention ignore activator => queue proxy interaction but the HTTP2 Upgrade stuff here is only about the queue proxy probe, there is no relation to the activator I think or am I missing something?

Besides that, thanks for the review!

Edit: did a rebase due to conflict in go.mod

@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from c52f12c to ed40414 Compare November 19, 2025 08:18
@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2025
@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from ed40414 to 483f15b Compare November 25, 2025 16:18
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2025
}
return resp, err
// Set the protocol hint for the auto-selecting prober transport
r.ProtoMajor = protoMajor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? The godoc says:

// For client requests, these fields are ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that all of this is a bit confusing, but in this case I'm just preserving the existing behavior from old line 101.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2025
@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 483f15b to cfd8e81 Compare November 28, 2025 12:16
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2025
@linkvt
Copy link
Contributor Author

linkvt commented Nov 28, 2025

/retest

@dprotaso
Copy link
Member

dprotaso commented Dec 1, 2025

FYI - I was playing with using the new go1.24 features of not needing the h2c package. See: knative/pkg#3298

Though when I test it out it's failing on Kourier and I could use some help there - #16280

@dprotaso
Copy link
Member

dprotaso commented Dec 2, 2025

@linkvt can we drop the queue proxy annotations from this PR

@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from cfd8e81 to f95df2c Compare December 2, 2025 18:28
@linkvt
Copy link
Contributor Author

linkvt commented Dec 2, 2025

@dprotaso sure, done 👍

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 20, 2025
@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from f95df2c to 2e70d20 Compare January 2, 2026 10:06
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 2, 2026
@linkvt
Copy link
Contributor Author

linkvt commented Jan 2, 2026

/retest

@linkvt
Copy link
Contributor Author

linkvt commented Jan 13, 2026

@dprotaso could you please take another look, I dropped the annotations as requested in your last comment.
Might be good to have this merged before the upcoming 1.21 release.

@dprotaso
Copy link
Member

Thanks for dropping the annotation change.

So in theory not supporting an h2c upgrade flow using the connection header might be a breaking change for users.

We should still phase it out - but it's not obvious to me which users we are breaking. Meaning which languages support the deprecated upgrade flow and not the h2 prior knowledge.

Curious if you have thoughts.

@dprotaso
Copy link
Member

I think at a minimum we hold this PR until after 1.21 but include in the 1.21 release notes we will be merging this PR in and dropping support for the h2c deprecated flow

@linkvt
Copy link
Contributor Author

linkvt commented Jan 15, 2026

Hi @dprotaso ,

I understand your concerns but think they don't block the PR as:

  • any server that supports the upgrade mechanism must already understand the HTTP/2 connection preface, since the preface is sent immediately after the 101 Switching Protocols response - prior knowledge just skips the 101 negotiation step
  • autodetect-http2 is a feature flag that's Disabled by default
  • RFC9113 states, that the upgrade via headers was never widely deployed and was deprecated 3.5 years ago

As we are the client sending the request with the upgrade header to detect the protocol, we were basically just expecting user applications to support the deprecated and optional upgrade mechanism.

I agree with documenting this in the release notes for better visibility 👍

@dprotaso
Copy link
Member

Oh - I forgot that this was behind a feature flag

Comment on lines 10 to 11
# Example: Override the queue sidecar image with a custom image
# queue.sidecar.serving.knative.dev/image: ko://knative.dev/serving/cmd/queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh must have missed it, fixed 👍

Comment on lines +53 to +59
resources, err := v1test.CreateServiceReady(
t, clients, &names, rtesting.WithNamedPort("h2c"),
rtesting.WithEnv(v1.EnvVar{
Name: "RESPONSE",
Value: test.HelloHTTP2Text,
}),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests expect a specific response from the http2 test target container, see

spoof.MatchesAllOf(spoof.IsStatusOK, spoof.MatchesBody(test.HelloHTTP2Text)),

For development and work in general on the http2 stuff it was helpful to

  • modify the http2 test image to use a native/stdlib HTTP2 server and
  • have it return more helpful responses for the default and health endpoint

IMO the change there is really minor and the http2 test image now allows us to continue using at as test target. If you think it should be reverted I can also do that.

@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 2e70d20 to 55734cd Compare January 16, 2026 08:02
@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2026
@linkvt linkvt force-pushed the fallback-to-http1-on-failed-http2-probe branch from 55734cd to f0e58a5 Compare January 16, 2026 08:03
@dprotaso
Copy link
Member

/retest
/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2026
@knative-prow
Copy link

knative-prow bot commented Jan 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, linkvt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot merged commit 705182d into knative:main Jan 16, 2026
152 of 154 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queue Proxy health checks incompatible with non-HTTP/2 applications [gRPC/http2 auto-detect] Flakiness and potential connection leak

3 participants