Address SNI binding and bypass for TLS listeners #72

jpeach · 2020-02-11T00:50:11Z

What would you like to be added:

In the current API, certificates are a property of the Listener and Routes are a property of the Gateway. Since there is no direct relationship between the certificates and the backend service (i.e. the Route), this structure encourages the separation of SNI routing from HTTP request routing. This can lead to a number of different security issues.

Why is this needed:

Operators need the ability to pin the set of certificates to be used for a specific backend service. That is, for a TLS session bound to SNI name foo.example.com, always forward HTTP requests to the backend session that is bound to the foo.example.com host name.

For a more thorough discussion of SNI bypass, see https://hal.inria.fr/hal-01202712/document

xref #49

The text was updated successfully, but these errors were encountered:

jpeach · 2020-02-11T01:15:21Z

Also RFC 6066, which strongly suggests that the HTTP host should be bound to the SNI name.

Since it is possible for a client to present a different server_name
in the application protocol, application server implementations that
rely upon these names being the same MUST check to make sure the
client did not present a different name in the application protocol.

bowei · 2020-02-13T23:34:50Z

How do the major proxy implementations handle this?
Is there possibly a use case to enable SNI/HTTP Host mismatch (I'm guessing no)
Is this dependent on how you set up your config (e.g. if we mandate such behavior, is it possible to achieve)

youngnick · 2020-02-14T00:24:05Z

Contour intends to stop this from happening, but in practice it's very easy to do it accidentally, via implementation details.

There are some use cases to do this (basically, allowing Domain fronting and similar use cases), but I think that they are specialised enough that we should default to not supporting that and figure it out later.

The key thing here is that, if we include the possibility of client certificates as part of the TLS config, SNI/HTTP Host mismatch is a big security risk. If you allow the TLS negotation to succeed for host foo, including the client certificate exchange, and that exchange is being used for authentication, then allowing the inner HTTP transaction to go to a different host completely bypasses it.

The use case here is that I'm using a Gateway for ultrasecure.foo.com, which requires TLS with a client certificate, and the same Gateway for insecure.foo.com, which requires a serving TLS only. TLS is terminated at the Gateway, and may be reencrypted afterwards.

In this case, requests with an SNI of insecure.foo.com will complete the TLS with the proxy, and, if you have a Host header of ultrasecure.foo.com, you will totally bypass the client certificate step.

I would argue this is very unexpected behaviour that creates a big problem for a user that they may not realise is a problem.

I think this problem is dependent on how we set up the constructs. As @jpeach said earlier, there needs to be a tight binding between TLS certificate config and the allowed hostnames associated with that certificate, for HTTP Routes. If we're talking about TCP with SNI routing only, then there is no Host, and thus this problem is not visible at this layer. (The service forwarded to may have it, but that is not our responsibility).

bowei · 2020-02-14T07:57:56Z

It's not clear why this is not an artifact of the a specific implementation as opposed to something that surfaces at the API layer.
For example, if your proxy carried the SNI field as context to the HTTP protocol processor, it can reject such mismatches if configured to do so.

Can you sketch what the Kubernetes API validation would look like that would prevent this?
Or is the statement that the user can guarantee such a thing cannot happen by crafting their configuration with a given shape.
There is nothing to prevent users from configuring a server cert that doesn't match their HTTPRoute, even within a Listener block.
Is the suggestion that we parse the cert and do validation on each of the configured domains?

As an aside, it seems like in for the ultrasecure.com vs insecure.com use case, we would recommend to users to create two Gateways or even separate GatewayClasses that correspond to completely isolated infrastructures. From a compliance perspective (e.g. PCI), this would be pretty much mandated.

jpeach · 2020-02-16T20:59:14Z

The first thing that we need in the issue is agreement that binding the HTTP Host/Authority to the certificate is the correct behavior. My contention that this is the case is supported by RFC 6066 and RFC 2818

Now, as to the API implications of this, I think that the first step is for the API to be able express the relationship between a virtual host and its TLS configuration. Once we can do this, then controller implementation has been told the users intention, and can do a lot more to ensure the correct results.

bowei · 2020-03-04T23:55:55Z

Example: every certificate comes with the set of hostnames that SHOULD be used

fejta-bot · 2020-06-03T00:23:49Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

robscott · 2020-06-03T01:23:33Z

/remove-lifecycle stale

fejta-bot · 2020-09-01T01:27:46Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

szuecs · 2020-09-02T20:47:37Z

/remove-lifecycle stale

robscott · 2020-09-24T23:56:42Z

@jpeach Do we still need to do anything here or did the Gateway restructuring help? If so, what would we need for v1alpha1?

robscott · 2020-09-24T23:57:36Z

Consensus seems to be we need to add some documentation here.

fejta-bot · 2021-02-11T17:23:48Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-03-13T18:10:05Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

robscott · 2021-04-08T21:26:24Z

/remove-lifecycle rotten

fejta-bot · 2021-07-07T22:18:22Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

hbagdi · 2021-07-08T18:32:46Z

@jpeach @youngnick It seems that we have consensus here and we need to add documentation. Since both of you have the most context here, do either of you mind taking this up and drive it to completion?

youngnick · 2021-07-15T04:29:27Z

I can take this one.

/assign

k8s-triage-robot · 2021-08-14T05:17:57Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

bowei · 2021-08-16T15:50:49Z

/lifecycle frozen

hbagdi · 2021-08-16T16:26:34Z

@robscott @youngnick While this is a clarification, can this be considered a borderline breaking change by any chance?
I ask to understand if this is something to pull into v1a2 or not.

youngnick · 2021-08-18T05:20:51Z

I agree that we should sort this out in v1a2, since it could be a breaking change. I'll add it to the milestone to make sure we don't miss it.

robscott · 2021-08-23T22:19:15Z

/assign @youngnick

youngnick · 2021-08-31T04:59:06Z

Updating this with a bit more context.

I'm going to quote from Contour's docs here, since @jpeach wrote this very precisely:

The HTTP/2 specification allows user agents (browsers) to re-use TLS sessions to different hostnames as long as they share an IP address and a TLS server certificate (see RFC 7540). Sharing a TLS certificate typically uses a wildcard certificate, or a certificate containing multiple alternate names. If this kind of session reuse is not supported by the server, it sends a “421 Misdirected Request”, and the user agent may retry the request with a new TLS session.

If we don't enforce a very tight binding between the SNI hostname and the HTTP hostname, then it's possible to trivially circumvent any certificate-based authentication at least in Envoy's implementation, like this:

Alice stands up secure.example.net, using a full TLS handshake, including required client certificates (yes, I know we don't have a way to specify that yet). This routes to a backend service via a HTTPRoute with the hostname secure.example.net.
Eve stands up gotcha.example.net, using a server-cert only TLS handshake, but specifies a HTTPRoute with a hostname secure.example.net.
Depending on how the underlying implementation handles HTTPRoutes that route to the same hostname, Eve's gotcha service could end up with no, some, or all traffic bound for secure.example.net.

Now, there are legitimate uses for this kind of domain fronting, mainly around caching and content distribution, but those uses are pretty niche. In almost all cases, if you are specifying a terminated TLS session via a HTTPRoute, those hostnames should match (in the domain-name match sense we already have defined in the spec). So, by mandating this, we remove some weird security things, and make the API behave more like people would expect.

I think this comes under the heading of removing weird edge cases that most people won't want until someone actually asks for them.

youngnick · 2021-09-21T01:29:44Z

After more consideration and rereading of #839, I think that we can probably close this out now, since we are mandating that the TLS hostname and the Listener hostname must match. Thanks for pointing this out, @hbagdi!

jpeach added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 11, 2020

This was referenced Feb 11, 2020

Per Route TLS Configuration #35

Closed

Support Virtual Host Based Routing for Non-SNI HTTPS Clients projectcontour/contour#1503

Closed

This was referenced Feb 12, 2020

Updates Gateway godocs #76

Merged

Adds Per Route TLS Config to Gateway API Type #71

Closed

jpeach mentioned this issue Feb 16, 2020

Introduce virtual servers as a concrete type. #101

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 3, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 3, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2020

hbagdi added the documentation Improvements or additions to documentation label Nov 13, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 13, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 8, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2021

k8s-ci-robot assigned youngnick Jul 15, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 14, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 16, 2021

youngnick added this to the v1alpha2 milestone Aug 18, 2021

hbagdi mentioned this issue Aug 30, 2021

Another round of v1alpha2 cleanup #839

Merged

youngnick closed this as completed Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address SNI binding and bypass for TLS listeners #72

Address SNI binding and bypass for TLS listeners #72

jpeach commented Feb 11, 2020

jpeach commented Feb 11, 2020

bowei commented Feb 13, 2020

youngnick commented Feb 14, 2020

bowei commented Feb 14, 2020

jpeach commented Feb 16, 2020

bowei commented Mar 4, 2020

fejta-bot commented Jun 3, 2020

robscott commented Jun 3, 2020

fejta-bot commented Sep 1, 2020

szuecs commented Sep 2, 2020

robscott commented Sep 24, 2020

robscott commented Sep 24, 2020

fejta-bot commented Feb 11, 2021

fejta-bot commented Mar 13, 2021

robscott commented Apr 8, 2021

fejta-bot commented Jul 7, 2021

hbagdi commented Jul 8, 2021

youngnick commented Jul 15, 2021

k8s-triage-robot commented Aug 14, 2021

bowei commented Aug 16, 2021

hbagdi commented Aug 16, 2021

youngnick commented Aug 18, 2021

robscott commented Aug 23, 2021

youngnick commented Aug 31, 2021

youngnick commented Sep 21, 2021

Address SNI binding and bypass for TLS listeners #72

Address SNI binding and bypass for TLS listeners #72

Comments

jpeach commented Feb 11, 2020

jpeach commented Feb 11, 2020

bowei commented Feb 13, 2020

youngnick commented Feb 14, 2020

bowei commented Feb 14, 2020

jpeach commented Feb 16, 2020

bowei commented Mar 4, 2020

fejta-bot commented Jun 3, 2020

robscott commented Jun 3, 2020

fejta-bot commented Sep 1, 2020

szuecs commented Sep 2, 2020

robscott commented Sep 24, 2020

robscott commented Sep 24, 2020

fejta-bot commented Feb 11, 2021

fejta-bot commented Mar 13, 2021

robscott commented Apr 8, 2021

fejta-bot commented Jul 7, 2021

hbagdi commented Jul 8, 2021

youngnick commented Jul 15, 2021

k8s-triage-robot commented Aug 14, 2021

bowei commented Aug 16, 2021

hbagdi commented Aug 16, 2021

youngnick commented Aug 18, 2021

robscott commented Aug 23, 2021

youngnick commented Aug 31, 2021

youngnick commented Sep 21, 2021