diff --git a/keps/sig-network/2610-allports-services/README.md b/keps/sig-network/2610-allports-services/README.md new file mode 100644 index 00000000000..9adac03b6d9 --- /dev/null +++ b/keps/sig-network/2610-allports-services/README.md @@ -0,0 +1,570 @@ +# KEP-2610: AllPorts Services + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Supported service types](#supported-service-types) + - [Usage](#usage) + - [Service Transitions](#service-transitions) + - [Life of a request](#life-of-a-request) + - [User Stories](#user-stories) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Open Questions to be resolved:](#open-questions-to-be-resolved) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Alpha -> Beta Graduation](#alpha---beta-graduation) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) +- [ ] (R) Graduation criteria is in place +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +Today, a Kubernetes Service accepts a list of ports to be exposed by it. +It is possible to specify any number of ports (as long as the service is within the max object size limit), by listing them in the service spec. +This can be tedious if the service needs a large number of ports. +This KEP proposes to add a new field to the Service spec to allow exposing the entire port range(1 to 65535). + +## Motivation + +There are several applications like SIP/RTP/Gaming servers that need a lot(1000+) of ports to run multiple calls or media streams. +Currently the only option is to specify every port in the Service Spec. A request for port ranges in Services has been open in https://github.com/kubernetes/kubernetes/issues/23864. Implementing port ranges are challenging since iptables/ipvs do not support remapping port ranges. Also, in order to specify several non-contiguous port ranges, the user will have to expose the entire valid port range. Hence, this proposal to set a single field in order to expose the entire port range and implement the service clients and endpoints accordingly. +[A survey](https://docs.google.com/forms/d/1FOOG2ZoQsnJLYAjnhEtSPYmUULWFNe88iXR7gtFcP7g/edit) was sent out to collect the use-cases for AllPorts support - [results.](http://tiny.cc/allportsslides) + +### Goals + +* Allow users to optionally expose the entire port range via a Service (of Type LoadBalancer or ClusterIP). + +### Non-Goals + +* Supporting Port Ranges in a Service. +* Changing the default behavior of Service ports. + +## Proposal + +The proposal here is to introduce an `allPorts` boolean(*bool) field to the [service API.](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#servicespec-v1-core) +A service that sets this field to true will be reachable on any valid port [1 to 65535]. The backend pods have to be configured accordingly. The backend pods will receive requests with the same port number - in other words, port remapping will not be possible. + +The [APIServer validation](https://github.com/kubernetes/kubernetes/blob/b9ce4ac212d150212485fa29d62a2fbd783a57b0/pkg/apis/core/validation/validation.go#L4162) that disallows empty `ports` will be relaxed, when `allPorts` is set. + +The value of `allPorts` field can be toggled on supported service types. + +### Supported service types + +Setting this field will be supported for: + +* ServiceType=ClusterIP +* ServiceType=LoadBalancer + +This field is not applicable to ExternalName services. + +NodePort services are not supported. +A NodePort service accepts traffic on a given port on the Node, redirecting it to the specified targetPort of the service endpoints. Support AllPorts for a NodePort service means - +traffic to any port on the node, will be forwarded to its endpoints on the same port. This could potentially break networking on the node, if traffic for, say, port 22 got forwarded to \:22. + +Headless services are not supported either. Headless services do not have a ClusterIP, nor can they be LoadBalancer services. SRV records for endpoints with empty port array will not be created. Hence, supporting AllPorts on Headless services has little value. + +Setting `allPorts` to true is not supported on services that specify [ExternalIPs in the spec.](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/#ServiceSpec) + +### Usage + +In order to expose the entire port range on a supported service, a user needs to: +1) Create a LoadBalancer/ClusterIP Service and set `allPorts` to True. +2) Leave the `ports` field empty. Populating the ports array is invalid when setting `allPorts` to True. + +There will be no NodePort allocation for the service in this case. The only exception is the NodePort for HealthChecks for LoadBalancers using `ExternalTrafficPolicy: Local`. + + +### Service Transitions + +Consider the following transitions: + +* Changing from ClusterIP to LoadBalancer type, or vice versa + Preserving `allPorts` value as well as toggling is supported. + +* Setting an ExternalIP on a ClusterIP that has `allPorts` true + `allPorts` value has to be unset. + +* Changing from ClusterIP/LoadBalancer to NodePort type + `allPorts` value has to be unset. + +* Changing from ClusterIP/LoadBalancer to NodePort type + Once the service has been changed to ClusterIP/LoadBalancer type, `allPorts` field can be set. + + +Transitioning from non-headless to headless and vice versa are not permitted by the API today. + + +### Life of a request + +1) A service with ClusterIP a.b.c.d is configured with `allPorts` set to true. This service has 2 endpoints - pod1, pod2. + +2) A client(1.2.3.4) sends a request to a.b.c.d:8888 - it is received on a cluster node where: + + a) firewall rules allow it + + b) kube-proxy iptables rules DNAT it to one of the backend pod IPs - p.q.r.s (pod1) + + c) request is received on pod1 with source IP - 1.2.3.4 and destination p.q.r.s:8888 + + d) pod1 responds directly to 1.2.3.4. + +The path taken by the request is similar in case of a LoadBalancer service. If the LoadBalancer implementation uses a proxy(instead of Direct Server Return), the proxy should be able to receive requests on all ports as well. + + +### User Stories + +* A user wants to expose ports 20,000 to 50,000 for a web-conferencing application that is exposed as a LoadBalancer service. + +The user can now create a LoadBalancer service with `allPorts` set to true. This will enable clients to connect on : , where port is any value between 20,000 and 50,000. + + +### Risks and Mitigations + +* Allowing empty `ports` array in the Service object could break clients that watch Services and expect a valid port value. +These include kube-proxy, kube-dns/CoreDNS, other controllers. To mitigate this risk, the API validation change that allows empty `ports`, +will be feature-gated and soaked in Alpha stage for 2-3 releases. That way, when this feature is Beta, +the supported node-version(upto 2 releases behind) will have the kube-proxy changes that handle empty ports. + +* A user could accidentally expose the entire port range on their cluster nodes, by enabling `allPorts` for a Service. +To avoid this, users should make sure that their firewall implementation only permits traffic to LoadBalancerIP:\ and not NodeIP:\. +This is mostly applicable to LoadBalancer services, that are typically +accessible from outside the cluster. +Currently, kube-proxy adds the right rules to only allow traffic to the ServiceIP/Port combination specified in the service. +When using AllPorts, kube-proxy will allow all traffic for the given serviceIP/LoadBalancerIP. Kube-Proxy rules alone cannot drop traffic to NodeIP:\. + +* LoadBalancing at IP-level could have regression in behavior. +For example, IPVS supports loadbalancing without specifying ports, for TCP and UDP services. +However, traffic to the same 3-tuple(dest IP, dest Port, protocol) will be sent to the same backend. +In other words, this is similar to setting `sessionAffinity: ClientIP`, but it will be the default(and only) behavior with AllPorts + IPVS. +In contrast, when service ports are specified, backend pods are selected at random, unless `sessionAffinity: ClientIP` is specified. +This could be mitigated by using iptables to implement the AllPorts logic. + +* A known issue with [host services being accessible via ClusterIP in ipvs mode](https://github.com/kubernetes/kubernetes/issues/72236) could be mitigated by AllPorts support. +If a ClusterIP service is created using `allPorts` set to true and sshd on the host listens on 0.0.00:22, traffic to `:22` will only go to backend pods. +If the service were not using `allPorts`, and did not specify port 22 in the Port list, sshd would be exposed by connecting to `:22`. +This is because the clusterIP is assigned to an ipvs interface in the host namespace. + +## Design Details + +Changes are required to APIServer validation, kube-proxy and controllers that use the ServicePort field. + +1) New Validation checks: + + * `allPorts` can be set to True only for ClusterIP(non-headless) and LoadBalancer services. + * The `ports` array should be empty when `allPorts` is set to True. + +2) Kube-Proxy should configure iptables/ipvs rules by skipping port/protocol filter, if `allPorts` is true. + +3) LoadBalancer controllers should create LoadBalancer resources with the appropriate port values. + +4) Endpoints and EndpointSlices controller should create Endpoints with empty port values. + +5) DNS Providers should handle empty ports in Services and Endpoints. + There will be no DNS SRV Records for Services with `allPorts` set. + coreDNS handles empty ports in [services](https://github.com/coredns/coredns/blob/09b63df9c1584bb5389d1b681698631bcd7c19e1/plugin/kubernetes/kubernetes.go#L577) and [endpoints.](https://github.com/coredns/coredns/blob/09b63df9c1584bb5389d1b681698631bcd7c19e1/plugin/kubernetes/kubernetes.go#L559) + kube-dns also handles empty ports in [services](https://github.com/kubernetes/dns/blob/077a43e83e648ba5f04bae18ffcb824edc9db967/pkg/dns/dns.go#L506) and [endpoints.](https://github.com/kubernetes/dns/blob/077a43e83e648ba5f04bae18ffcb824edc9db967/pkg/dns/dns.go#L541) + There is a warning in kube-dns for [empty port services](https://github.com/kubernetes/dns/blob/077a43e83e648ba5f04bae18ffcb824edc9db967/pkg/dns/dns.go#L320) that could be removed after GA of AllPorts. + +6) There will be no environment variable of the form "_SERVICE_PORT" for services with `allPorts` set. This [codepath](https://github.com/kubernetes/kubernetes/blob/e1f971d5c2a1002c4e90471d064f87f297740aba/pkg/kubelet/envvars/envvars.go#L48) currently assumes non-zero ports and that will be updated. + +#### Open Questions to be resolved: + +1) How should IPVS implementation be handled? +Options are to a) use iptables for DNAT b) use ipvs rules that results in same 5-tuple requests being sent to the same backend pod. c) create allPorts services as "fwmark-service" and assign a unique mark for each of them. + +2) Identfy how Service Mesh(Istio)/Calico/MetalLB can support AllPorts. + +### Test Plan + +Unit tests: + +* To verify API validation of the `allPorts` and `ports` fields. +* To verify that all users(kube-proxy, kubelet, various controllers) of Service/Endpoints can handle empty ports. +* To check the default value of `allPorts` on each type of Service. + +E2E tests: + +* To verify that default behavior of `allPorts` does not break any existing e2e tests. +* Test setting `allPorts` on a new Service and connecting to the service VIP on a few different ports. +* Test setting `allPorts` on an existing Service and connecting to the service VIP on a few different ports. +* Test unsetting `allPorts` on a service and specifying a single port allows traffic only for that port. +* Test setting `allPorts` explicitly to false. + +### Graduation Criteria + +#### Alpha + +- Add a new field `allPorts` to Service, but it can only be set when the feature gate is on. + + +#### Alpha -> Beta Graduation + +- Ensure that the main clients of Service/Endpoints API - kube-proxy, kubelet, loadbalancer controllers have added support to handle empty ports. All supported node versions for the given master version(that graduates `allPorts` to Beta) should handle this case correctly. +- Tests are in Testgrid and linked in KEP +- Demonstrated community adoption of this feature. + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs? + + + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +* Create a new service type - IPOnly which does not accept any port entries. + Traffic redirection to endpoints from service VIP will be IP-based, without a port or protocol filter. + + This requires modifying existing LoadBalancer controller to also provision resources for this type of Service. + Adding a new service type is quite a bit of API overhead. This would be a better fit in the [Gateway API.](https://gateway-api.sigs.k8s.io/) + +* Restrict AllPorts support to LoadBalancer services only. Allow LoadBalancer services to be headless [by relaxing the validation the check.](https://github.com/kubernetes/kubernetes/blob/036cab71a6faefa84b10a199a61bcdc38e3572c3/pkg/apis/core/validation/validation.go#L4177) + Default behavior is to allow traffic for all Protocols. A list of allowed protocols can be specified in a Protocols list(new field) in the ServiceSpec. + + This approach breaks the assumption that all LoadBalancer services are valid ClusterIP services. It also does not provide AllPorts support for ClusterIP services which was desirable based on the [survey results.](https://docs.google.com/presentation/d/1FO9H55-gnDh2RIqOZMDoP4OPVbaIaPhKl9C5to4uNNE/edit#slide=id.gdf6ff10943_0_30) + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-network/2610-allports-services/kep.yaml b/keps/sig-network/2610-allports-services/kep.yaml new file mode 100644 index 00000000000..82545dc5b54 --- /dev/null +++ b/keps/sig-network/2610-allports-services/kep.yaml @@ -0,0 +1,44 @@ +title: AllPorts Services +kep-number: 2610 +authors: + - "@prameshj" +owning-sig: sig-network +participating-sigs: + - sig-cloud-provider +status: provisional +creation-date: 2021-04-08 +reviewers: + - "@thockin" + - "@aojea" + - "@danwinship" + - "@uablrek" + - "@bowei" + - "@freehan" +approvers: + - "@thockin" +prr-approvers: + - "@wojtek-t" + + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.23" + + + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: AllPortsService + components: + - kube-apiserver + - kube-controller-manager + - kube-proxy + - kubelet + - loadbalancer controllers +disable-supported: true +