-
Notifications
You must be signed in to change notification settings - Fork 597
GEP-2643: TLSRoute #4064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GEP-2643: TLSRoute #4064
Conversation
geps/gep-2643/index.md
Outdated
// match. | ||
// | ||
// If both the Listener and TLSRoute have specified hostnames, any | ||
// TLSRoute hostnames that do not match the Listener hostname MUST be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TLSRoute hostnames that do not match the Listener hostname MUST be | |
// TLSRoute hostnames that do not match any Listener hostname MUST be |
This is on the current API, so would need to be fixed there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "any"? A Listener has only one hostname, hence "the hostname"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because you can be attaching to a Gateway that has multiple Listeners, and not specifying a sectionName on the parentRef
will make the route try to attach to any Listener, so any Listener hostname
|
||
* TLSRoute | ||
|
||
### Conformance tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @rostislavbobo so we can discuss a bit more about it :)
* If a hostname is specified by both the `Listener` and `TLSRoute`, there must | ||
be at least one intersecting hostname for the `TLSRoute` to be attached to the | ||
`Listener`. | ||
* A `Gateway listener` with `test.example.com` as the hostname matches a `TLSRoute` that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is DIFFERENT from what is specified on the API, specifically on
gateway-api/apis/v1alpha3/tlsroute_types.go
Lines 68 to 70 in 702cfd2
// * A Listener with `test.example.com` as the hostname matches TLSRoutes | |
// that have specified at least one of `test.example.com` or | |
// `*.example.com`. |
but the way the API is worded seems problematic. We may want a wildcard support on the Gateway Listener, but not on a TLSRoute. A TLSRoute should be a strict hostname, without any wildcard, to provide proper routing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API docs are correct here. There's some ancient context in #43, but I believe the intent is that a route can support matching multiple hostnames to be able to attach to separate parents (e.g. a staging.example.com
listener and an example.com
listener on an entirely different gateway), whereas listeners must be distinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what bothers me here is a bit of the semantics and coverage. For me it makes sense that a Listener, which is an upper/parent/ancestor resource can be more generic (eg.: I listen and accept routes of *.example.tld) but the TLSRoute hostname/s should be more specific.
Is the idea of having a wildcard on a route something like:
- Gateway1 - tls1.example.com
- Gateway2 - tls2.example.com
- TLSRoute - Parents Gateway1/Gateway2 - hostname
*.example.com
This may seem a bit confusing from a using perspective, IMO regarding TLSRoutes (tho we say at some point of this GEP that we may support ALPN routing in the future so this can make more sense).
I would like to hear from other implementors as well how much difficulty this will bring for the TLSRoute case, but if we are ok on having TLSRoute doing this sort of reverse match, I can rollback the comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea of having a wildcard on a route something like
Yes, that has been my understanding of the intent of this design. It's also been baked into the Gateway API UX for years now between Gateway listeners and HTTPRoute, and not something I think makes a lot of sense to try changing or taking a different approach at this point.
There's also relationships between the certificateRefs bundle associated with the listener TLS settings, and the actual domain records directing traffic to the Gateway to consider how those intersect with this configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe https://github.com/kubernetes-sigs/gateway-api/blob/main/conformance/tests/httproute-hostname-intersection.yaml#L103 is an example from the HTTPRoute conformance tests showing that *.example.com
on a route should match foo.example.com
on a Gateway listener, and we should be consistent across routes regarding this behavior.
the later must not be considered for a match. | ||
* In any of the cases above, the `TLSRoute` should have a `Condition` of `Accepted=True`. | ||
|
||
## Multiplexing support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part needs some discussion/attention:
- Is Multiplexing support a core feature, or a implementation specific feature?
- Previously we state that a Gateway with a TLS termination can only have TLSRoutes, but here we say that multiple listeners on the same port, for different types can be accepted. So the conditions and conflict management should be changed to reflect this (and the conformance tests), if we agree that this case is possible
- What conditions should an implementation add when this is not supported? We need to word it explicitly on the GEP and the expected conformance tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Let's keep it implementation-specific.
- It's a niche feature, so I don't expect many implementations to support it or have a need for it.
- Let's move protocol multiplexing into a separate GEP.
- It goes beyond TLSRoute scope and spans multiple protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add initially the multiplexing as implementation specific as is.
I did a test with Istio and was able to check that it works OOTB, I expect that at least any other envoy, haproxy and nginx implementation can work with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think multiplexing should likely be extended conformance.
The current docs at https://gateway-api.sigs.k8s.io/reference/spec/#gatewayspec allude to it being permissible and definitely not required as it may be difficult for some implementations, but I think we can and should have conformance tests for this to ensure the behavior is deterministic and predictable.
This probably does merit its' own GEP though, and should likely pull context from https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1435-mixed-protocol-lb too.
geps/gep-2643/index.md
Outdated
* When a Gateway contains a listener with `protocol=TLS` and `tls.mode=Passthrough`, | ||
the `Gateway` MUST NOT allow another listener on the same port with a different | ||
`tls.mode` and the `Gateway` SHOULD be marked as `Accepted=False`. | ||
* Any violating Listener should have a Condition `Conflicted=True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a discussion that we had: is this for any listener, for the one added later? If there is a conflict listener on the TLSRoute case, do we want to mark all of Listeners as conflicted? Should we stop serving in this case, and mark Accepted=false?
Per @Miciah comment:
The GatewaySpec godoc is explicit: 'If a set of Listeners contains Listeners that are not distinct, then those Listeners are Conflicted, and the implementation MUST set the "Conflicted" condition in the Listener Status to "True"', and, "The implementation MUST NOT pick one conflicting Listener as the winner."
The godoc for ListenerConditionOverlappingTLSConfig re-iterates: "This condition MUST be set on all Listeners with overlapping TLS config."
- what if we have a conflict? Do we really want all of the listeners to be gone?
- what if we are using ListenerSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed here for now as:
* When a Gateway does not support [Multiplexing](#multiplexing-support) and contains
a listener with `protocol=TLS`, the Gateway MUST NOT allow any other kind of
listener on the same port, and any violating Listener should have a Condition `OverlappingTLSConfig=True`
with the reason `OverlappingProtocols`.
This is a new condition that we should be adding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the suggestion to mark all listeners as conflicted feels at odds with the typical conflict resolution guidance in https://gateway-api.sigs.k8s.io/guides/api-design/#conflicts, but I think maybe we aren't able to follow that because listener config is all batched together as an atomic update to the Gateway (and trying to be "stateful" rather than reflecting the current YAML is an anti-pattern)?
(I think the most granular breakdown achievable might be one entire ListenerSet attached to a Gateway becomes conflicted, but other ListenerSets attached to the same Gateway remain functional.)
// +required | ||
// +kubebuilder:validation:MinItems=1 | ||
// +kubebuilder:validation:MaxItems=16 | ||
Hostnames []Hostname `json:"hostnames,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what exists on the current API, but IMO the hostnames should be part of a TLSRouteRule
and not its own field on the spec.
It doesn't make much sense that the rules are an array, that contain an array of backendRefs, but the hostnames are outside of it, but maybe there's some more context here.
Maybe it should be something like:
rules:
- hostnames:
- abc.com
- def.com
backendRefs:
- name: tls-backend
port: 443
As hostnames is a filter that will direct for the backendRefs, and we don't expect soon to have any additional filter for TLSRoutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no filters how the backend from a list of BackendRefs should be chosen? Should we require the weight field to be uniquely set for the backend here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make much sense that the rules are an array, that contain an array of backendRefs
There are two reasons for that.
TLSRouteRule
at some point might haveTLSRouteMatch
with ALPN match
kind: TLSRoute
spec:
hostnames:
- "example.com"
rules:
- matches:
- alpn:
- h2
backendRefs:
- name: example-backend
port: 443
BackendRefs
hasweight
, which we're not supporting withTLSRouteRule
yet
kind: TLSRoute
spec:
hostnames:
- "example.com"
rules:
- backendRefs:
- name: example-backend-1
port: 443
weight: 0.5
- name: example-backend-2
port: 443
weight: 0.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the hostnames should be part of a
TLSRouteRule
and not its own field on the spec.
@howardjohn and @hbagdi , what was the motivation for #682 moving hostnames out of TLSRouteRule (besides aligning with HTTPRoute)? TLSRoute doesn't have many matching options compared to HTTPRoute.
So now, instead of having a single TLSRoute that fans out to multiple backends
kind: TLSRoute
spec:
rules:
- matches:
- hostnames:
- "example.com"
backendRefs:
- name: example-backend
port: 443
- matches:
- hostnames:
- "*.com"
backendRefs:
- name: fallback-backend
port: 443
Everyone now needs to set up multiple TLSRoutes for a single backend:
kind: TLSRoute
spec:
hostnames:
- "example.com"
rules:
backendRefs:
- name: example-backend
port: 443
kind: TLSRoute
spec:
hostnames:
- "*.com"
rules:
backendRefs:
- name: fallback-backend
port: 443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://kubernetes.slack.com/archives/CR0H13KGA/p1756935423971129
The GEP should contain that:
- As of today/GA just a single backendRef is supported on TLSRoute
- Eventually we will support other matchers like ALPN and this may change in the future
// TLSRoute specified `test.example.com` and `test.example.net`, | ||
// `test.example.net` must not be considered for a match. | ||
// | ||
// If both the Listener and TLSRoute have specified hostnames, and none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to precise that Listener's protocol needs to be of type: TLS and that TLSRoutes only applies to Listeners with this protocol. Is this said somewhere in the doc/spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we wanted to clearly cover the Listener protocol and XRoutes compatibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. As a heads up, the API here was copied exactly from the existing code, so the idea is that we also make the proper updates on the API based on the comments here.
That said, we do not explicitly say anywhere here that TLSRoute is attacheable to Listeners of type TLS and Passthrough, just on places like https://gateway-api.sigs.k8s.io/guides/tls/
I will make this explicit on this doc
To be added: #3541 |
837ebfb
to
033815b
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rikatz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign |
@rikatz: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
While this feature is also known sometimes as TLS passthrough, where after the | ||
server name is identified, the gateway does a full encrypted passthrough of the | ||
communication, this GEP will also cover cases where a TLS communication is | ||
terminated on the Gateway and passed unencrypted to a backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terminated on the Gateway and passed unencrypted to a backend. | |
terminated on the Gateway before being passed to a backend. |
|
||
### TLSRoute + TLS Termination | ||
In this workflow, the TLS traffic will be matched against the `SNI attribute` of | ||
the request, terminated on the `Gateway` and passed unencrypted as a `TCP` connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the request, terminated on the `Gateway` and passed unencrypted as a `TCP` connection | |
the request, terminated on the `Gateway` and passed as a `TCP` connection |
May be re-encrypted by mesh ingress implementations or BackendTLSPolicy. I don't think it's necessary to mention that explicitly, just remove the explicit "unencrypted".
What type of PR is this?
/kind gep
What this PR does / why we need it:
Adds the TLSRoute GEP, which is a document aggregating all of the existing TLSRoute implementations and also adding some disambiguation discussions
Which issue(s) this PR fixes:
Fixes #2643
Does this PR introduce a user-facing change?:
This GEP is targeting v1.5