-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible envoy regression causes HTTP 404 #2468
Comments
Thanks for the report @alex1989hu. Can you give us a sample HTTPProxy, so we can attempt to replicate please? |
Having I installed Let me share the simplest service where Extra: apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
name: knowledgebase-proxy
namespace: knowledgebase
labels:
app.kubernetes.io/name: knowledgebase
spec:
virtualhost:
fqdn: kb.foo.bar.acme.com
tls:
secretName: projectcontour/wildcard-foobar-cert
routes:
- conditions:
- prefix: /
services:
- name: knowledgebase-service
port: 8080
responseHeadersPolicy:
set:
- name: Strict-Transport-Security
value: "max-age=31536000; includeSubdomains"
- name: X-Content-Type-Options
value: nosniff
- name: X-Frame-Options
value: DENY
- name: X-XSS-Protection
value: "1; mode=block" |
@alex1989hu This could be a result of the SNI binding change. Does the client support SNI, and is it using |
@jpeach: do you mean client like Chrome, Edge? Sure, entered |
Is it an intermittent 404 , or permanent? |
Intermittent - it can also go wrong after the browser was able to load the content. I mean a simple page refresh can cause |
An intermittent problem suggest to me that there's an issue with only some of the envoy proxies. Do you have logs that let you correlate 404s to specific envoy proxies? |
I am working on a cluster with |
@alex1989hu At this point, I don't think we need any new config; we need to narrow the scope of the issue. I think the most likely cause is the SNI issue, but don't know why your usage would bot work with that change. The other problem we need to understand is why the 404 is intermittent. Testing against a clean 1.4 cluster could be worthwhile. LMK what you find. |
@alex1989hu Can you share any of the 404 logs? |
@jpeach: search |
@alex1989hu Are there separate HTTPProxy documents for In the log, I can see two adjacent entries:
This looks like the same request from the same client, but with different results. The only way I can explain this is if there are multiple envoys running with different configurations? |
@jpeach: The cluster is freshly installed w/ |
@alex1989hu I've been assuming that you have separate HTTPProxy documents for Can you post (private is OK) a Envoy config dump? See the troubleshooting guide, and curl the config_dump endpoint. Can you also show me the pod status for contour and envoy?
|
FYI: @jpeach Contacted directly via |
I've found a similar behavior after upgrading as well. It appears to be related to http2 connection coalescing. The SNI (envoy I was able to work around it by issuing separate certs for each virtualhost and updating the |
Thanks for the CVE reference @lmickh |
I dug into this some more and read the various RFC specs and history. I'm pretty comfortable saying that this is a duplicate of #1493. If we serve a wildcard certificate, browsers will consider existing connections OK to reuse even if the SNI server names associated with the connections differ, because each origin name will match successfully against the wildcard certificate. This previously worked in Contour because we did not enforce any binding between the SNI server name and the origin hostname. I'd argue that this was always a mis-feature since it breaks multi-tenancy (tenant services are visible to each other), but in 1.4 we had to fix it so that we could make guarantees about TLS client certificate authentication (we can't allow a non-authenticated session to make requests to an origin that requires authentication). Unfortunately, this means that more users are exposed to the underlying problem originally documented in #1493, since basically anyone using wildcard certificates will be affected. AFAICT, in the Envoy configuration we generate, there are no security implications. The problem manifests as a 404 response; we never forward a request to an inappropriate origin. The workaround, as noted above by @lmickh, is to avoid wildcard certificates and get a separate certificate for each hostname. I understand that's not going to be possible for everyone. I expect that the right fix is to convince Envoy to serve a 421 response when the SNI server name doesn't match the hosted origin. Duplicate of #1493. |
Duplicate of #1493 |
It is worth noting that this is not limited to wildcard certificates. It also occurs if the same Secret object is used for more than one FQDN in different Ingress objects, for example. But the message is the same - use different certificates for each FQDN. |
What steps did you take and what happened:
I have upgraded
contour
to1.4.0
from1.3.0
. I see many404
errors inenvoy
pod log. The services behindHTTPProxy
can not be loaded, I got404
error in browser, too. Constantly hitting refresh button in my browser temporary solves the issue: it can load the page. I thought it is a network issue: after downgraded to1.3.0
the services behindHTTPProxy
instantly loaded, no error log inenvoy
. Reinstalled the whole cluster from scratch with1.4.0
, the same symptom can be seen. Downgraded back to1.3.0
solves the issue again.UPDATE: actually it is
404
, not401
.What did you expect to happen:
No
404
error.Anything else you would like to add:
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: