-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ptrivedi The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @ptrivedi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
afc6467
to
835e6a3
Compare
…dress (ClusterIP Gateway) Signed-off-by: Pooja Trivedi poojatrivedi@google.com
835e6a3
to
6a061ca
Compare
Adding this comment here for tracking a few open items resulting from the comments on the google doc here: https://docs.google.com/document/d/1N-C-dBHfyfwkKufknwKTDLAw4AP2BnJlnmx0dB-cC4U/edit?tab=t.0
|
potentially other resource kinds) directly to a Route via backendRef. | ||
|
||
```yaml | ||
{% include 'standard/clusterip-gateway/tcproute-with-endpointselector.yaml' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these does not render. or does it suppose to be like this? see https://github.com/ptrivedi/gateway-api/blob/gep-clusterip-gateway/geps/gep-3539/index.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. Please see the deploy preview here: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/
Also added this to the PR description:
Recommend reviewing deploy preview so examples are inlined: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wasnt aware of that page.
is to have a GatewayClass corresponding to each type of service networking behavior that needs to be modeled | ||
and supported. | ||
|
||
data:image/s3,"s3://crabby-images/42d25/42d25563b69d222f6f0eb3bfa51c8ca7104b410b" alt="image displaying gatewayclasses to represent different service types" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image missing or incorrect file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was a missing image. Fixed. Thanks for catching
| Feature | ServiceAPI options | Gateway API possibilities | | ||
|---|---|---| | ||
| sessionAffinity | ClientIP <br /> NoAffinity | Route level | ||
| allocateLoadBalancerNodePorts | True <br /> False | Not supported for ClusterIP Gateway <br /> Supported for LoadBalancer Gateway | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could say N/A for this approach, since you can create LB type without NodePort - sort of simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be clearer until further discussion on each of these.
1e793b0
to
b5e81ee
Compare
* Fix missing image * Change GEP status to Memorandum * Make GEP navigable * Crop trailing whitespace from images Signed-off-by: Pooja poojatrivedi@google.com
b5e81ee
to
e876ced
Compare
/assign @thockin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First: LOVE IT
The questions I keep coming back to all are around how the node-proxy knows to pay attention to THIS gateway so it can implement the clusterIP or nodePort or externalTrafficPolicy or ...
@@ -0,0 +1,25 @@ | |||
kind: TCPRoute/CustomRoute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this syntax Foo/Bar
for the example or is it somethign real? I don't think I have ever seen it and I don't know what it means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the context, it appears that this is covering the base of "I'm not sure what actual Kind we're talking about here".
@ptrivedi - if that's what you mean, I'd recommend leaving a comment next to it to explain and/or using an optional-selection notation like [TCPRoute|CustomRoute]
.
- name: example-cluster-ip-gateway | ||
rules: | ||
config: | ||
sessionAffinity: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the indent wrong on this?
|
||
### EndpointSelector as Backend | ||
|
||
A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I can imagine a path toward maybe making this a regular core feature. I am sure that it would be tricky but I don't think it's impossible.
Eg.
Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing. Same as we do with IP.
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: GatewayClass | ||
metadata: | ||
name: cluster-ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this name "special" or can it be anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intended that GatewayClass names can be any valid Kubernetes object name.
metadata: | ||
name: cluster-ip | ||
spec: | ||
controllerName: "cluster-ip-controller" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this name "special" or can it be anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name can be anything but implementations must only reconcile GatewayClasses that has a controllerName
that they expect. GatewayClass objects that do not match an implementation's controllerName
must ignore that GatewayClass completely, and not update it at all (to prevent fighting on status
).
Some implementations allow configuration of this string (for example, Contour allows it so that you can run multiple instances of Contour in a cluster).
name: example-cluster-ip-gateway | ||
spec: | ||
addresses: | ||
- 10.12.0.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does kube-proxy (or Cilium or Antrea or ...) know which Gateways it should be capturing traffic for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally that's handled by the rollup of Gateway -> GatewayClass. Implementations own GatewayClasses that specify the correct string in GatewayClass spec.controllerName
. All Gateways in that GatewayClass in that GatewayClass would need to be serviced by an implementation that can fulfill this request (that is, it both has the required functionality, and, in this case of requesting a static address, is actually able to assign that address). In the case that an implementation cannot fulfil this Gateway for some reason, it must be marked as not Accepted (by having an Accepted
type condition in the Gateway's status with status: false
).
{% include 'standard/clusterip-gateway/clusterip-gateway.yaml' %} | ||
``` | ||
|
||
By default, IP address(es) from a pool specified by a CIDR block will be assigned unless a static IP is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default path should be to allocate from the same ServiceCIDR resource. If you need an IP from a different resource you would do something different. Either a different class or a different allocator or something.
in pods’ /etc/resolv.conf need to be programmed accordingly by kubelet. | ||
|
||
``` | ||
<name of gateway>.<gateway-namespace>.gw.cluster.local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DNS is a fraught topic. We REALLY REALLY do not want to add more search paths, especially if they could cause ambiguous names. We could just lean on the "svc" space for this, since these are effectively services. We would need to define how to avoid collisions and I'd be lying if I said I had a great answer.
Maybe, like IPAddress, we extract ServiceName to new resource, and whomever gets there first wins? That sort of transaction doesn't work well for CRDs but I guess it could be async. Weird failure modes.
|
||
| Feature | ServiceAPI options | Gateway API possibilities | | ||
|---|---|---| | ||
| sessionAffinity | ClientIP <br /> NoAffinity | Route level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does L4 Gateway support affinity?
| sessionAffinity | ClientIP <br /> NoAffinity | Route level | ||
| allocateLoadBalancerNodePorts | True <br /> False | Not supported for ClusterIP Gateway <br /> Supported for LoadBalancer Gateway | | ||
| externalIPs | List of externalIPs for service | Not supported? | | ||
| externalTrafficPolicy | Local <br /> Cluster | Supported for LB Gateways only, Route level | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all interesting challenges which maybe need something more than a plain TCP Route?
|
||
When modeling ClusterIP service networking, the simplest recommendation might be to keep Gateway and Routes | ||
within the same namespace. While cross namespace routing would work and allow for evolved functionality, | ||
it may make supporting certain cases tricky. One specific example for this case is the pod DNS resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we are making a new DNS name, do we actually care to support this POD-IP DNS name?
Note that Gateway API allows flexibility and clear separation of concerns so that one would not need to | ||
configure cluster-ip and node-port when configuring a load-balancer. | ||
|
||
But for completeness, the case shown below demonstrates how load balancer functionality analogous to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this proposal makes sense as a logic way to solve "If you had to implement Service using Gateway API primitives, how would you do it".
What doesn't make sense to me is the why and the how this becomes something practically useful from a proposal to a thing in the real world.
The diagram below shows 1 object becoming 8. Do we expect users to actually create these 8 objects?
Which projects are expected to, and which are commited to, supporting these? Kube-proxy? Coredns? Various 3p CNIs (Cilium, calico, etc)? Service meshes? All gateway implementations?
## Goals | ||
|
||
* Define Gateway API usage to accomplish ClusterIP Service style behavior | ||
* Propose DNS layout and record format for ClusterIP Gateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like we have fleshed this out. Compared to https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ we have just 1-2 sentences with a lot of ambiguity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass over the first half or so, still thinking through some of the later half. Will be back with more in the next few days.
# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address | ||
|
||
* Issue: [#3539](https://github.com/kubernetes-sigs/gateway-api/issues/3539) | ||
* Status: Memorandum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should currently be Provisional
, as it's the first iteration and we are still deciding on the approach here.
The Memorandum
status is for registering general agreement about things, not for features that will require actual code changes to the Gateway API specification (which this definitely will).
This also needs to be changed in the corresponding metadata.yaml
file - the YAML file is actually the canonical place for the status, this is just to remind everyone. I'll suggest the same change there.
name: example-cluster-ip-gateway | ||
spec: | ||
addresses: | ||
- 10.12.0.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally that's handled by the rollup of Gateway -> GatewayClass. Implementations own GatewayClasses that specify the correct string in GatewayClass spec.controllerName
. All Gateways in that GatewayClass in that GatewayClass would need to be serviced by an implementation that can fulfill this request (that is, it both has the required functionality, and, in this case of requesting a static address, is actually able to assign that address). In the case that an implementation cannot fulfil this Gateway for some reason, it must be marked as not Accepted (by having an Accepted
type condition in the Gateway's status with status: false
).
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: GatewayClass | ||
metadata: | ||
name: cluster-ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intended that GatewayClass names can be any valid Kubernetes object name.
metadata: | ||
name: cluster-ip | ||
spec: | ||
controllerName: "cluster-ip-controller" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name can be anything but implementations must only reconcile GatewayClasses that has a controllerName
that they expect. GatewayClass objects that do not match an implementation's controllerName
must ignore that GatewayClass completely, and not update it at all (to prevent fighting on status
).
Some implementations allow configuration of this string (for example, Contour allows it so that you can run multiple instances of Contour in a cluster).
@@ -0,0 +1,25 @@ | |||
kind: TCPRoute/CustomRoute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the context, it appears that this is covering the base of "I'm not sure what actual Kind we're talking about here".
@ptrivedi - if that's what you mean, I'd recommend leaving a comment next to it to explain and/or using an optional-selection notation like [TCPRoute|CustomRoute]
.
Gateway API enables advanced traffic routing and can be used to expose a | ||
logical set of pods on a single IP address within a cluster. It can be seen | ||
as the next generation ClusterIP providing more flexibility and composability | ||
than Service API. This comes at the expense of some additional configuration | ||
and manageability burden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gateway API enables advanced traffic routing and can be used to expose a | |
logical set of pods on a single IP address within a cluster. It can be seen | |
as the next generation ClusterIP providing more flexibility and composability | |
than Service API. This comes at the expense of some additional configuration | |
and manageability burden. | |
Gateway API enables advanced traffic routing and can be used to expose a | |
logical set of pods on a single IP address within a cluster. With some changes, | |
it could be used as a next generation ClusterIP Service replacement, | |
providing more flexibility and composability than the existing Service API. | |
This comes at the expense of some additional configuration | |
and manageability burden, but we believe that the additional value | |
gained is worth the cost. |
This one is just a suggestion to make this read a little bit more clearly to me. Feel free to disregard if it doesn't match your intent here.
## API Changes | ||
|
||
* EndpointSelector is recognized as a backend | ||
* DNS record format for ClusterIP Gateways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## API Changes | |
* EndpointSelector is recognized as a backend | |
* DNS record format for ClusterIP Gateways | |
## API Changes Summary | |
* EndpointSelector is recognized as a backend | |
* DNS record format for ClusterIP Gateways |
We haven't done this before in GEPs, but I really like this quick summary of the API changes as part of this GEP. @robscott @shaneutt @mlavacca should we consider adding this to the template?
(Gateway resource), implementation specifics and common configuration (GatewayClass | ||
resource), and routing traffic to backends (Route resource). | ||
|
||
### Limitations of Service API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be realistic here and acknowledge the benefits of the Service API from a user's POV - which I think we could summarize as that, for simple use cases, its very simple. It's only one object, as opposed to (at minimum) four in the simplest case here (GatewayClass, Gateway, Route, and EndpointSelector).
I completely agree that breaking Service apart for more advanced use cases is useful, but we should acknowledge the reason why it's stuck around for so long - the level of simplicity and flexibility it has allows folks to get started much more easily. Additionally, Service is a GA API that's not going anywhere, so we need to be very clear that we're not talking about deprecating or replacing Service with this. As with Gateway API north/south and Ingress, the GA core resource is going to stick around, but this proposal is about giving us a better base to look at adding features to rather than trying to fit them into the existing, overloaded Service construct.
Speaking from experience, putting a section outlining this into this document now will save a lot of discussion later.
Recommend reviewing deploy preview so examples are inlined: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/
Signed-off-by: Pooja Trivedi poojatrivedi@google.com
What type of PR is this?
/kind gep
What this PR does / why we need it:
This defines via documentation how Gateway API can be used to accomplish ClusterIP Service behavior. It also proposes DNS record format for ClusterIP Gateway, proposes an EndpointSelector resource, and briefly touches upon Gateway API usage to define LoadBalancer and NodePort behaviors.
Which issue(s) this PR fixes:
Fixes #3539
Does this PR introduce a user-facing change?: