Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ptrivedi
Copy link

@ptrivedi ptrivedi commented Feb 10, 2025

Recommend reviewing deploy preview so examples are inlined: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/

Signed-off-by: Pooja Trivedi poojatrivedi@google.com

What type of PR is this?

/kind gep

What this PR does / why we need it:

This defines via documentation how Gateway API can be used to accomplish ClusterIP Service behavior. It also proposes DNS record format for ClusterIP Gateway, proposes an EndpointSelector resource, and briefly touches upon Gateway API usage to define LoadBalancer and NodePort behaviors.

Which issue(s) this PR fixes:

Fixes #3539

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) labels Feb 10, 2025
Copy link

linux-foundation-easycla bot commented Feb 10, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ptrivedi
Once this PR has been reviewed and has the lgtm label, please assign thockin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @ptrivedi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 10, 2025
@ptrivedi ptrivedi force-pushed the gep-clusterip-gateway branch from afc6467 to 835e6a3 Compare February 10, 2025 21:30
…dress (ClusterIP Gateway)

Signed-off-by: Pooja Trivedi poojatrivedi@google.com
@ptrivedi ptrivedi force-pushed the gep-clusterip-gateway branch from 835e6a3 to 6a061ca Compare February 10, 2025 21:48
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 10, 2025
@ptrivedi
Copy link
Author

Adding this comment here for tracking a few open items resulting from the comments on the google doc here: https://docs.google.com/document/d/1N-C-dBHfyfwkKufknwKTDLAw4AP2BnJlnmx0dB-cC4U/edit?tab=t.0

  1. Topology aware routing feature needs to be discussed and hashed out in detail. Features like internal/externalTrafficPolicy should then be appropriately morphed and provided as a part of topology aware routing
  2. EndpointSelector resource and DNS for Gateway topics warrant followup GEPs focused on these areas
  3. Headless, ExternalName, and other DNS functionality may warrant separate DNS API/Object. Subject to further discussion
  4. Need broader discussion around where do we implement this functionality, does it replace Service API completely in the long term and that we should have a migration plan, or does it become an underlying implementation for Service functionality allowing the simpler UX provided by Service API to be unchanged for end users while allowing advanced users to deal with Gateway API resources directly

@robscott @bowei @aojea @howardjohn @mskrocki

potentially other resource kinds) directly to a Route via backendRef.

```yaml
{% include 'standard/clusterip-gateway/tcproute-with-endpointselector.yaml' %}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Please see the deploy preview here: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/

Also added this to the PR description:
Recommend reviewing deploy preview so examples are inlined: https://deploy-preview-3608--kubernetes-sigs-gateway-api.netlify.app/geps/gep-3539/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wasnt aware of that page.

is to have a GatewayClass corresponding to each type of service networking behavior that needs to be modeled
and supported.

![image displaying gatewayclasses to represent different service types](images/gatewayclasses-lb-np.png "image displaying gatewayclasses to represent different service types")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image missing or incorrect file name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was a missing image. Fixed. Thanks for catching

| Feature | ServiceAPI options | Gateway API possibilities |
|---|---|---|
| sessionAffinity | ClientIP <br /> NoAffinity | Route level
| allocateLoadBalancerNodePorts | True <br /> False | Not supported for ClusterIP Gateway <br /> Supported for LoadBalancer Gateway |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could say N/A for this approach, since you can create LB type without NodePort - sort of simplification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be clearer until further discussion on each of these.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 11, 2025
@ptrivedi ptrivedi force-pushed the gep-clusterip-gateway branch 3 times, most recently from 1e793b0 to b5e81ee Compare February 12, 2025 15:17
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 12, 2025
* Fix missing image
* Change GEP status to Memorandum
* Make GEP navigable
* Crop trailing whitespace from images

Signed-off-by: Pooja poojatrivedi@google.com
@ptrivedi ptrivedi force-pushed the gep-clusterip-gateway branch from b5e81ee to e876ced Compare February 12, 2025 15:35
@ptrivedi
Copy link
Author

/assign @thockin

Copy link

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First: LOVE IT

The questions I keep coming back to all are around how the node-proxy knows to pay attention to THIS gateway so it can implement the clusterIP or nodePort or externalTrafficPolicy or ...

@@ -0,0 +1,25 @@
kind: TCPRoute/CustomRoute
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this syntax Foo/Bar for the example or is it somethign real? I don't think I have ever seen it and I don't know what it means

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the context, it appears that this is covering the base of "I'm not sure what actual Kind we're talking about here".

@ptrivedi - if that's what you mean, I'd recommend leaving a comment next to it to explain and/or using an optional-selection notation like [TCPRoute|CustomRoute].

- name: example-cluster-ip-gateway
rules:
config:
sessionAffinity: false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the indent wrong on this?


### EndpointSelector as Backend

A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I can imagine a path toward maybe making this a regular core feature. I am sure that it would be tricky but I don't think it's impossible.

Eg.

Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing. Same as we do with IP.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: cluster-ip
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this name "special" or can it be anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended that GatewayClass names can be any valid Kubernetes object name.

metadata:
name: cluster-ip
spec:
controllerName: "cluster-ip-controller"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this name "special" or can it be anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name can be anything but implementations must only reconcile GatewayClasses that has a controllerName that they expect. GatewayClass objects that do not match an implementation's controllerName must ignore that GatewayClass completely, and not update it at all (to prevent fighting on status).

Some implementations allow configuration of this string (for example, Contour allows it so that you can run multiple instances of Contour in a cluster).

name: example-cluster-ip-gateway
spec:
addresses:
- 10.12.0.15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does kube-proxy (or Cilium or Antrea or ...) know which Gateways it should be capturing traffic for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally that's handled by the rollup of Gateway -> GatewayClass. Implementations own GatewayClasses that specify the correct string in GatewayClass spec.controllerName. All Gateways in that GatewayClass in that GatewayClass would need to be serviced by an implementation that can fulfill this request (that is, it both has the required functionality, and, in this case of requesting a static address, is actually able to assign that address). In the case that an implementation cannot fulfil this Gateway for some reason, it must be marked as not Accepted (by having an Accepted type condition in the Gateway's status with status: false).

{% include 'standard/clusterip-gateway/clusterip-gateway.yaml' %}
```

By default, IP address(es) from a pool specified by a CIDR block will be assigned unless a static IP is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default path should be to allocate from the same ServiceCIDR resource. If you need an IP from a different resource you would do something different. Either a different class or a different allocator or something.

in pods’ /etc/resolv.conf need to be programmed accordingly by kubelet.

```
<name of gateway>.<gateway-namespace>.gw.cluster.local
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DNS is a fraught topic. We REALLY REALLY do not want to add more search paths, especially if they could cause ambiguous names. We could just lean on the "svc" space for this, since these are effectively services. We would need to define how to avoid collisions and I'd be lying if I said I had a great answer.

Maybe, like IPAddress, we extract ServiceName to new resource, and whomever gets there first wins? That sort of transaction doesn't work well for CRDs but I guess it could be async. Weird failure modes.


| Feature | ServiceAPI options | Gateway API possibilities |
|---|---|---|
| sessionAffinity | ClientIP <br /> NoAffinity | Route level
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does L4 Gateway support affinity?

| sessionAffinity | ClientIP <br /> NoAffinity | Route level
| allocateLoadBalancerNodePorts | True <br /> False | Not supported for ClusterIP Gateway <br /> Supported for LoadBalancer Gateway |
| externalIPs | List of externalIPs for service | Not supported? |
| externalTrafficPolicy | Local <br /> Cluster | Supported for LB Gateways only, Route level |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all interesting challenges which maybe need something more than a plain TCP Route?


When modeling ClusterIP service networking, the simplest recommendation might be to keep Gateway and Routes
within the same namespace. While cross namespace routing would work and allow for evolved functionality,
it may make supporting certain cases tricky. One specific example for this case is the pod DNS resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are making a new DNS name, do we actually care to support this POD-IP DNS name?

Note that Gateway API allows flexibility and clear separation of concerns so that one would not need to
configure cluster-ip and node-port when configuring a load-balancer.

But for completeness, the case shown below demonstrates how load balancer functionality analogous to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this proposal makes sense as a logic way to solve "If you had to implement Service using Gateway API primitives, how would you do it".

What doesn't make sense to me is the why and the how this becomes something practically useful from a proposal to a thing in the real world.

The diagram below shows 1 object becoming 8. Do we expect users to actually create these 8 objects?

Which projects are expected to, and which are commited to, supporting these? Kube-proxy? Coredns? Various 3p CNIs (Cilium, calico, etc)? Service meshes? All gateway implementations?

## Goals

* Define Gateway API usage to accomplish ClusterIP Service style behavior
* Propose DNS layout and record format for ClusterIP Gateway
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we have fleshed this out. Compared to https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ we have just 1-2 sentences with a lot of ambiguity here.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass over the first half or so, still thinking through some of the later half. Will be back with more in the next few days.

# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address

* Issue: [#3539](https://github.com/kubernetes-sigs/gateway-api/issues/3539)
* Status: Memorandum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should currently be Provisional, as it's the first iteration and we are still deciding on the approach here.

The Memorandum status is for registering general agreement about things, not for features that will require actual code changes to the Gateway API specification (which this definitely will).

This also needs to be changed in the corresponding metadata.yaml file - the YAML file is actually the canonical place for the status, this is just to remind everyone. I'll suggest the same change there.

name: example-cluster-ip-gateway
spec:
addresses:
- 10.12.0.15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally that's handled by the rollup of Gateway -> GatewayClass. Implementations own GatewayClasses that specify the correct string in GatewayClass spec.controllerName. All Gateways in that GatewayClass in that GatewayClass would need to be serviced by an implementation that can fulfill this request (that is, it both has the required functionality, and, in this case of requesting a static address, is actually able to assign that address). In the case that an implementation cannot fulfil this Gateway for some reason, it must be marked as not Accepted (by having an Accepted type condition in the Gateway's status with status: false).

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: cluster-ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended that GatewayClass names can be any valid Kubernetes object name.

metadata:
name: cluster-ip
spec:
controllerName: "cluster-ip-controller"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name can be anything but implementations must only reconcile GatewayClasses that has a controllerName that they expect. GatewayClass objects that do not match an implementation's controllerName must ignore that GatewayClass completely, and not update it at all (to prevent fighting on status).

Some implementations allow configuration of this string (for example, Contour allows it so that you can run multiple instances of Contour in a cluster).

@@ -0,0 +1,25 @@
kind: TCPRoute/CustomRoute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the context, it appears that this is covering the base of "I'm not sure what actual Kind we're talking about here".

@ptrivedi - if that's what you mean, I'd recommend leaving a comment next to it to explain and/or using an optional-selection notation like [TCPRoute|CustomRoute].

Comment on lines +8 to +12
Gateway API enables advanced traffic routing and can be used to expose a
logical set of pods on a single IP address within a cluster. It can be seen
as the next generation ClusterIP providing more flexibility and composability
than Service API. This comes at the expense of some additional configuration
and manageability burden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Gateway API enables advanced traffic routing and can be used to expose a
logical set of pods on a single IP address within a cluster. It can be seen
as the next generation ClusterIP providing more flexibility and composability
than Service API. This comes at the expense of some additional configuration
and manageability burden.
Gateway API enables advanced traffic routing and can be used to expose a
logical set of pods on a single IP address within a cluster. With some changes,
it could be used as a next generation ClusterIP Service replacement,
providing more flexibility and composability than the existing Service API.
This comes at the expense of some additional configuration
and manageability burden, but we believe that the additional value
gained is worth the cost.

This one is just a suggestion to make this read a little bit more clearly to me. Feel free to disregard if it doesn't match your intent here.

Comment on lines +27 to +30
## API Changes

* EndpointSelector is recognized as a backend
* DNS record format for ClusterIP Gateways
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## API Changes
* EndpointSelector is recognized as a backend
* DNS record format for ClusterIP Gateways
## API Changes Summary
* EndpointSelector is recognized as a backend
* DNS record format for ClusterIP Gateways

We haven't done this before in GEPs, but I really like this quick summary of the API changes as part of this GEP. @robscott @shaneutt @mlavacca should we consider adding this to the template?

(Gateway resource), implementation specifics and common configuration (GatewayClass
resource), and routing traffic to backends (Route resource).

### Limitations of Service API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be realistic here and acknowledge the benefits of the Service API from a user's POV - which I think we could summarize as that, for simple use cases, its very simple. It's only one object, as opposed to (at minimum) four in the simplest case here (GatewayClass, Gateway, Route, and EndpointSelector).

I completely agree that breaking Service apart for more advanced use cases is useful, but we should acknowledge the reason why it's stuck around for so long - the level of simplicity and flexibility it has allows folks to get started much more easily. Additionally, Service is a GA API that's not going anywhere, so we need to be very clear that we're not talking about deprecating or replacing Service with this. As with Gateway API north/south and Ingress, the GA core resource is going to stick around, but this proposal is about giving us a better base to look at adding features to rather than trying to fit them into the existing, overloaded Service construct.

Speaking from experience, putting a section outlining this into this document now will save a lot of discussion later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GEP: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP)
6 participants