-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multus service abstraction enhancement #900
Conversation
edits to service abstraction
88fb857
to
500c047
Compare
/assign @trozet |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- Provide all types of Kubernetes service functionality for secondary network interface. | ||
- Provide the exact same semantics for Kubernetes service mechanisms. | ||
- LoadBalancer features for pods' secondary network interface (this requires more discussion in upstream community). | ||
- NodePort feature (still in discusssion among upstream community). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to externalIp too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"LoadBalancer features for pods' secondary network interface" means the expose service to outside cluster with "external IP'. This is NOT 'not supported forever', just 'not supported in this release and will be discussed in community and try to find how to design/implement'.
So 'not supported for now', but of course in TODO (future feature) list. Does it make sense?
|
||
#### Cluster IP for Multus | ||
|
||
When the service is created with `type: ClusterIP`, then Kubernetes assign cluster virtual IP for the services. User |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to be able to choose a different CIDR for "secondary" services?
In ovnk for example, there are some places where the routing is based on the service CIDR IIRC, so I think avoiding overlapping could save some headaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a much more complicated problem. If a user is creating a service that is not in the default CIDR, that's almost "not a service" (according to kube's definitions). Is ovnk substantially disadvantaged by reusing the CIDR, or is this an optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some routes / rules that are expressed in terms of service CIDRs just because it's easier to espress them in that way
It's probably traffic that does not intercept the path we are adding (through secondary interfaces, from pods to pods), so it could be that it's not affected.
I thought the service cidr(s) were not enforced strictly, so I called this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fedepaol Multus service uses same IP address ranges of Kubernetes Cluster IP. In addition, multus service also add routes for the CIDR (i.e. /32) in pod network namespaces as well (see *1), hence the packet to multus cluster IP should go to multus network, not ovn network. In addition, currently ovnk should not have flow rules for multus service due to the label, service.kubernetes.io/service-proxy-name
as https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/ovn.go#L1253-L1264
*1) Here is the ip route
output in pod network namespace. the outgoing interface is specified to net1 to multus service cluster IP 10.108.162.172
, hence the packet to the service never goes to eth0.
[root@fedora-net1 /]# ip route
default via 10.244.1.1 dev eth0
10.1.1.0/24 dev net1 proto kernel scope link src 10.1.1.13
10.108.162.172 dev net1 proto kernel src 10.1.1.13
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.8
`k8s.v1.cni.cncf.io/service-network` specifies which network (i.e. net-attach-def) is the target for exposing the | ||
service. This label specifies only one net-attach-def in same namespace of the Service/Pods, hence Service/Network | ||
Attachment Definition/Pods should be in the same namespace. This annotation only allows users to specify one network, | ||
hence one Service VIP should match one of service/network mappings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not entirely clear to me how the VIP is choosen and who assignes it. As mentioned before, I think it would make sense not to intercept the SDN service ip range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multus service uses Kubernetes service controller to choose VIP from Kubernetes service cluster IP CIDR range. So VIP is chosen by Kubernetes service controller as usual Kubernetes service does. But as I mentioned, user specify the label service-proxy-name
to explicitly mention that the service is for multus, hence the service is not used for Kubernetes service.
- Ingress and other functionalities related to Kubernetes service (this will be addressed in another enhancement). | ||
- Network verification for accesssing services (i.e. users need to ensure network targets are reachabile). | ||
- Services for universal forwarding mechanisms (e.g DPDK Pods are not supported), we only focus on the pods which uses Linux network stack (i.e. iptables). | ||
- Headless service (need to more discussion in community for its design) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to do* more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe why this needs more discussion? This seems like the easiest service type on the surface (since it simply exposes those endpoints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton It seems to be the easiest from implementation point of view (because we don't need multus-proxy), however actually not.
The headless service has big assumption, 'all client of the service should uses Kubernetes cluster DNS (e.g. coredns). Kubernetes cluster service, of cource, all pod should uses coredns, however, multus network, which may be isolated from Kubernetes network) may have other clients (e.g. PCs and some servers) and these clients may uses different DNS. Hence for headless service for multus, we may need a way to connect coredns from outside of the cluster. That is the reason why we don't support immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinkennelly fixed.
- The SDN solution provides forwarding plane for Kubernetes service (i.e. Cilium, ovn-kubernetes). | ||
- The SDN solution does not recognize the well-known label, `service.kubernetes.io/service-proxy-name`. | ||
|
||
Hence we should mention which SDN solution supports this enhancement in documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and hopefully make Issues to networking plugins that do not respect the label. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, but it could be the future work (i.e. not for the 4.10 release) because we also add routing table entry for multus service VIP in pod network namespace, hence the packet must goes to multus network, not cluster network.
Without support, the cluster network (e.g. openshift-sdn) may contain unused flow for multus service.
|
||
### Non-Goals | ||
|
||
Due to the variety of service types, this must be handled in a phased approach. Therefore this covers the initial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd specify that the intention is to provide the service abstraction only to traffic that is being originated by pods (i.e. hitting the service from outside the cluster wont work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that it is mentioned in 'Service Reachability' section. In addition, "Cluster IP" is only for cluster in Kubernetes context, hence I don't mentioned that is only for cluster. It is same (but of course, multus service VIP can be accessible from the pods which has target net-attach-def, as I noted in 'Service Reachability' section.
- NodePort feature (still in discusssion among upstream community). | ||
- Ingress and other functionalities related to Kubernetes service (this will be addressed in another enhancement). | ||
- Network verification for accesssing services (i.e. users need to ensure network targets are reachabile). | ||
- Services for universal forwarding mechanisms (e.g DPDK Pods are not supported), we only focus on the pods which uses Linux network stack (i.e. iptables). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each of the topics here of non goals, I'd like a brief description of the why it wouldn't be supported (one or two sentences, perhaps down below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even a sub bullet, but each of the non-goals needs a clearly articulated reason for not being a goal ("this is hard because of X which is a fundamental open design question" or "there are multiple possible ways to do Y and it is not clear which one is best") along with the overall "we believe the core of this use case is valuable on its own, and as it is opt in and using an existing API, we can articulate why it works"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, here's a first pass at expanding upon the non-goals, @s1061123
- Provide all types of Kubernetes service functionality for secondary network interface.
- In order to focus on quality of an initial subset of service functionality.
- Provide the exact same semantics for Kubernetes service mechanisms.
- Which would increase the scope substantially for an initial feature.
- LoadBalancer features for pods' secondary network interface
- The approach for which requires more discussion in upstream community, may use other external components.
- Which may also handle external traffic, this enhancement is for providing service functionality from traffic originating from Kubernetes pods.
- NodePort feature
- Still in discusssion among upstream community, and looking for input on use cases.
- Ingress and other functionalities related to Kubernetes service
- This will be addressed in another enhancement.
- Network verification for accesssing services (i.e. users need to ensure network targets are reachabile).
- This functionality would be separate from this enhancement and the scope is somewhat orthoganal.
- Additionally, there is on-going upstream discussion regarding secondary networks hierarchy which may also address which networks are reachable.
- Services for universal forwarding mechanisms (e.g DPDK Pods are not supported), we only focus on the pods which uses Linux network stack (i.e. iptables).
- This makes the scope for this feature reasonable, and we may allow for extensibility which allows for pluggable modules which handle other forwarding mechanisms.
- Headless service
- Due to assumptions of reachability to coreDNS, which may not be available to clients on isolated networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton 'Non goal' is not non-goal forever, just for current planned release. Currently listed items in non-goal is 'not discussed yet/not designed yet/not implemented yet' and not 'not supported forever'. As you know, Kubernetes service functionality is diverse hence it needs a so long time to discuss about each items in community. I believe that 'Cluster VIP/load-balancing for VIP' could be minimum viable functionality for the service and even this functionality is required by multus user I believe.
If you suppose we should have clear design/implementation for the all Kubernetes service functions (e.g. dual stack/external load-balancing/so on) and bring it in one shot (as old fashioned waterfall model), then I am okey to close this PR and bring it back to community and try to discuss (but let you know that it might take a pretty long time, more than 1 year).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dougbtv thank you for the draft! Changed.
is updated/removed, then multus service controller changes/removes the corresponding endpoitslices to track the | ||
changes. | ||
|
||
3. Multus proxy generates iptables rules for the service in Pod's network namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really seems like a gap in kube-proxy. Why can't another controller create EndpointSlices (the multus-controller) but let the default kube-proxy implement that behavior? Having to duplicate the proxy seems like a poor structure in Kube.
If there is a long term limitation that justifies the proxy duplication, it should be called out here (this is roughly a design tradeoff between "kube should let you inject endpoint slices via a custom controller that the service-proxy handles" and "when service-proxy can't handle the job of defining connectivity it must be duplicated"). We should always call out those tradeoffs explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I'm not clear your comment. In multus service, multus-service controller creates EndpointSlices and multus-proxy (i.e. not default kube-proxy) implement the behavior. EndpointSlices may have a label (which is copied from service), service.kubernetes.io/service-proxy-name
(which is defined in https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/apis/well_known_labels.go#L22) and the label specifies which proxy takes care of the endpoints.
dda1bb0
to
6cdfd7f
Compare
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@s1061123: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@s1061123: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Please update the doc to match the new template, including adding the missing sections and a link to the Jira ticket or other tracking artifact in the header. |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
No description provided.