-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a KEP for supporting AllPorts services. #2611
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: prameshj The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc |
|
||
### Goals | ||
|
||
* Allow users to optionally expose the entire port range via a Service (of Type LoadBalancer or ClusterIP). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uablrek ~ can you comment on the kproxy supportability for this ? re: ipvs/iptables/nftables/...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That iptables/ipvs doesn't support port-ranges is not entirely true, but they can not re-map ports in a range, like with an offset. The ports are unchanged.
From ipvsadm(8) man page;
-A, --add-service
Add a virtual service. A service address is uniquely defined by
a triplet: IP address, port number, and protocol. Alternatively,
a virtual service may be defined by a firewall-mark.
As you can see you must use firewall-mark
to load-balance a range or an entire ip. The rule to set the fwmark is done by ip(6)tables and can specify port-ranges and protocol (or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case a single fwmark can be used to cover all services. If we allowed arbitrary ranges, we'd quickly run out of marks. Unless I misunderstand how they are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I may be wildly, totally wrong. Looking at docs and playing around with ipvsadm
it's more restrictive than I thought.
As far as I can see, there is no way to say "forward all ports" without making the vserver "persistent" (which seems to be roughly about client affinity). I don't think we want that, but I am not an IPVS expert. It has a timeout arg which has a floor of 1 second. Does that make it better?
If IPVS can't implement whole-IP, what do we do with this? Is it dead? Can we spin it as strictly optional?
@andrewsykim ICYMI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about defining allPorts service as fwmark-service? That doesn't appear to make it a persistent service. iptables rules will set the packet mark.
If we used the --tcp-service/--udp-service option with the persistent behavior, this might be desirable based on use-case? If the clients are going to use a random port to reach the service, it is probably ok to send packets to the same service port to the same backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true.. managing the mask can be messy, since we would also need to deal with conflicting mark values used by other components (Calico, other networking addons).
I missed that the persistent feature is 3-tuple, not 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the port forwarding behavior might be do-able with ipvs using ipsets, but I would need to play around with it a bit to be sure.
I think a larger problem with IPVS could be regarding it's "bind-all" behavior (kubernetes/kubernetes#72236), where a all ports Service will conflict with a real host port. Long standing issue that we need to address, but have not come up with a good path forward yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Services of type allPorts
don't have node ports. So it is primarily
ClusterIP:port
which will be dnated topod:port
ExternalIP:port
which will be dnated topod:port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadbalancers with no nodeports? those are a thing with #1864
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should just be a few allPorts services in a cluster, say max 16 (a small confugurable number?), and if fwmark is used only for those it should be managable. It is also a per-node resource so kube-proxy can manage them locally, e.g re-read after re-start. See also #2611 (comment).
I would like to see a valid use-case for I have not developed SIP apps for many years and never RTP but I from contacts with SIP app developers it seem they always use headless services because they need control over the clients. If all-ip services can be limited to The stake-holders should not be asked "do you want all-ip services for ClusterIP?" since the answer would be "yes" (why not), but something like; "are you prepared to delay all-ip service services for ~1y at least, most likely pushed to kube-proxy v2 (which is under way), to get ClusterIP support?" |
For Setup; ipvsadm -A -f 1 -s rr
ipvsadm -A -f 1 -6 -s rr
ipvsadm -a -f 1 -r 192.168.1.1:0 -m
ipvsadm -a -f 1 -6 -r [1000::1:192.168.1.1]:0 -m
# (repeat for all real servers)
iptables -t mangle -A PREROUTING -i eth2 -d 10.0.0.0/24 -j MARK --set-mark 1
ip6tables -t mangle -A PREROUTING -i eth2 -d 1000::/120 -j MARK --set-mark 1 Port ranges, protocol etc. are defined with Status after 100 connects;
Fwmark becomes a resource that must be handled by |
The proposal here is to introduce an “allPorts” boolean field to the [servicePort API.](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#serviceport-v1-core) | ||
This field is introduced within ServicePort rather than ServiceSpec, because a ServicePort entry is needed to infer the protocol. | ||
|
||
This field will be supported for Service Type=ClusterIP and ServiceType=LoadBalancer. This is not applicable to ExternalName services. NodePort services are not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of weird given that LoadBalancer services are normally a superset of NodePort... and in particular, with "AWS-style" LoadBalancers, the LB rewrites packets from lb-ip:port
to node-ip:port
. I guess that means you have to do "GCP-style" LoadBalancers (where the LB delivers packets to the node still addressed to the LB IP) for allPorts services, even if you do "AWS-style" for normal services... (This may mean that some clouds can't support allPorts?)
At any rate, saying that allPorts LoadBalancer services are not also NodePort services may be an API break...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At any rate, saying that allPorts LoadBalancer services are not also NodePort services may be an API break...
I think we burnt that bridge here https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1864-disable-lb-node-ports
Thanks @uablrek . This would still require an API validation change to allow LoadBalancer + ClusterIP: None, but no need for any change to ports validation. After this, allPorts LoadBalancer services will not necessarily be valid ClusterIP services. @danwinship Would that be a breaking change? |
Even more narrow; type:LoadBalancer + clusterIP:None + no ports. |
In some cases in the API, it's OK to loosen validation and allow cases that were previously forbidden. But even that can run into problems if there are clients that need to be able to understand all objects of a given type. So, eg, in the case of Services, any change to LoadBalancer behavior that might cause existing external cloud provider implementations to do the wrong thing (or crash!) should probably be considered breaking. It might work better to add a new It's possible that we won't be able to add all-ports/whole-IP to |
Then to limit allIP services to headless services would be safe since cloud providers would sort them out for now anyway? |
I will add this topic to the next sig-network meeting for discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to double-check this, but there could be an issue with IPVS regarding its current "bind-all" behavior (kubernetes/kubernetes#72236), where an all ports Service will conflict with a real host port bindings. Long-standing issue that we need to address, but have not come up with a good path forward yet.
|
||
### Goals | ||
|
||
* Allow users to optionally expose the entire port range via a Service (of Type LoadBalancer or ClusterIP). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the port forwarding behavior might be do-able with ipvs using ipsets, but I would need to play around with it a bit to be sure.
I think a larger problem with IPVS could be regarding it's "bind-all" behavior (kubernetes/kubernetes#72236), where a all ports Service will conflict with a real host port. Long standing issue that we need to address, but have not come up with a good path forward yet.
FTR: This is exceedingly unlikely to make 1.22 - too many open issues |
Items marked with (R) are required *prior to targeting to a milestone / release*. | ||
|
||
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | ||
- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we look into how implementable this is by common networking implementations? CNIs (calico, cilium, etc), service meshes (linkerd, istio), and load balancer implementations (GKE/AWS, metallb, etc) are all things that could plausible have a very hard time supporting this.
As an Istio maintainer, I can tell you this would be a challenge to implement for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this @howardjohn. I would like to get more clarity on the use-cases for the AllPorts feature to figure out the actual proposal here and revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have a somewhat fleshed out proposal - I have marked this as an "Open question" so we can iterate after merging this initial PR.
approvers: | ||
- "@thockin" | ||
prr-approvers: | ||
- "@johnbelamaric" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please reassign to me - John seem to be overloaded this cycle.
Also - you will need to fill in the necessary PRR questions for alpha (not sure if you want to do that in this PRR or in a once that will move it move implementable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Wojciech. I have reassigned to you.. This will not be targeting alpha in 1.23, most likely the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - then I will get back to it after feature freeze.
Addressed review comments. Refactored to include new allPorts field as the proposal. Added an "open questions" section. Updated note about ipvs clusterip.
I have used the simplified setup proposed by @aojea in #2611 (comment) as example in a proposed blog post about kpng; kubernetes/website#29783 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
I suggest to close this KEP and implement all-ip services off-tree as a custom load-balancer with kpng as indicated in https://kubernetes.io/blog/2021/10/18/use-kpng-to-write-specialized-kube-proxiers/ The ClusterIP thing complicated things as I pointed out in #2611 (comment);
And here we are almost a year later and the progress on this feature is basically halted. Considering the interrest and 👍 's in kubernetes/kubernetes#23864 this is a highly wanted feature. However to please everybody, like Istio and nodePort-only clusters (AWS) will stall implementation forever. IMHO it's better to write a custom proxier that would be useful for the wast majority of users. Perhaps as a project in K8s network plumbing wg. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
Setting this field will be supported for: | ||
|
||
* ServiceType=ClusterIP | ||
* ServiceType=LoadBalancer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work with cloud load balancers that only forward to an assigned port and / or only listen on an assigned port?
b) kube-proxy iptables rules DNAT it to one of the backend pod IPs - p.q.r.s (pod1) | ||
|
||
c) request is received on pod1 with source IP - 1.2.3.4 and destination p.q.r.s:8888 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming TCP, but we should be explicit.
1) How should IPVS implementation be handled? | ||
Options are to a) use iptables for DNAT b) use ipvs rules that results in same 5-tuple requests being sent to the same backend pod. c) create allPorts services as "fwmark-service" and assign a unique mark for each of them. | ||
|
||
2) Identfy how Service Mesh(Istio)/Calico/MetalLB can support AllPorts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I have a UDP all-ports Service and a TCP all-ports Service on the same ClusterIP?
* To verify that default behavior of `allPorts` does not break any existing e2e tests. | ||
* Test setting `allPorts` on a new Service and connecting to the service VIP on a few different ports. | ||
* Test setting `allPorts` on an existing Service and connecting to the service VIP on a few different ports. | ||
* Test unsetting `allPorts` on a service and specifying a single port allows traffic only for that port. | ||
* Test setting `allPorts` explicitly to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test with multiple protocol families, not just TCP.
## Motivation | ||
|
||
There are several applications like SIP/RTP/Gaming servers that need a lot(1000+) of ports to run multiple calls or media streams. | ||
Currently the only option is to specify every port in the Service Spec. A request for port ranges in Services has been open in https://github.com/kubernetes/kubernetes/issues/23864. Implementing port ranges are challenging since iptables/ipvs do not support remapping port ranges. Also, in order to specify several non-contiguous port ranges, the user will have to expose the entire valid port range. Hence, this proposal to set a single field in order to expose the entire port range and implement the service clients and endpoints accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nftables can do this remapping. If that's right, another way to view the restriction is that kube-proxy doesn't come with a way to access that remapping (due to accessing netfilter via the older iptables API).
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@briantopping: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
||
There are several applications like SIP/RTP/Gaming servers that need a lot(1000+) of ports to run multiple calls or media streams. | ||
Currently the only option is to specify every port in the Service Spec. A request for port ranges in Services has been open in https://github.com/kubernetes/kubernetes/issues/23864. Implementing port ranges are challenging since iptables/ipvs do not support remapping port ranges. Also, in order to specify several non-contiguous port ranges, the user will have to expose the entire valid port range. Hence, this proposal to set a single field in order to expose the entire port range and implement the service clients and endpoints accordingly. | ||
[A survey](https://docs.google.com/forms/d/1FOOG2ZoQsnJLYAjnhEtSPYmUULWFNe88iXR7gtFcP7g/edit) was sent out to collect the use-cases for AllPorts support - [results.](http://tiny.cc/allportsslides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is not reachable anymore.
Is there any progress on the topic? Will Services ever allow to open a range of ports at once? |
There has not been progress. IIRC the main sticking points for "all ports" are:
It may be better to define this in the L4 Gateway model. |
No description provided.