Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multus service abstraction enhancement #900

Closed
wants to merge 19 commits into from

Conversation

s1061123
Copy link
Contributor

No description provided.

@s1061123
Copy link
Contributor Author

s1061123 commented Oct 6, 2021

/assign @trozet

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from trozet after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Provide all types of Kubernetes service functionality for secondary network interface.
- Provide the exact same semantics for Kubernetes service mechanisms.
- LoadBalancer features for pods' secondary network interface (this requires more discussion in upstream community).
- NodePort feature (still in discusssion among upstream community).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to externalIp too?

Copy link
Contributor Author

@s1061123 s1061123 Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LoadBalancer features for pods' secondary network interface" means the expose service to outside cluster with "external IP'. This is NOT 'not supported forever', just 'not supported in this release and will be discussed in community and try to find how to design/implement'.

So 'not supported for now', but of course in TODO (future feature) list. Does it make sense?


#### Cluster IP for Multus

When the service is created with `type: ClusterIP`, then Kubernetes assign cluster virtual IP for the services. User
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to be able to choose a different CIDR for "secondary" services?

In ovnk for example, there are some places where the routing is based on the service CIDR IIRC, so I think avoiding overlapping could save some headaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a much more complicated problem. If a user is creating a service that is not in the default CIDR, that's almost "not a service" (according to kube's definitions). Is ovnk substantially disadvantaged by reusing the CIDR, or is this an optimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some routes / rules that are expressed in terms of service CIDRs just because it's easier to espress them in that way
It's probably traffic that does not intercept the path we are adding (through secondary interfaces, from pods to pods), so it could be that it's not affected.
I thought the service cidr(s) were not enforced strictly, so I called this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fedepaol Multus service uses same IP address ranges of Kubernetes Cluster IP. In addition, multus service also add routes for the CIDR (i.e. /32) in pod network namespaces as well (see *1), hence the packet to multus cluster IP should go to multus network, not ovn network. In addition, currently ovnk should not have flow rules for multus service due to the label, service.kubernetes.io/service-proxy-name as https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/ovn.go#L1253-L1264

*1) Here is the ip route output in pod network namespace. the outgoing interface is specified to net1 to multus service cluster IP 10.108.162.172, hence the packet to the service never goes to eth0.

[root@fedora-net1 /]# ip route
default via 10.244.1.1 dev eth0 
10.1.1.0/24 dev net1 proto kernel scope link src 10.1.1.13 
10.108.162.172 dev net1 proto kernel src 10.1.1.13 
10.244.0.0/16 via 10.244.1.1 dev eth0 
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.8 

`k8s.v1.cni.cncf.io/service-network` specifies which network (i.e. net-attach-def) is the target for exposing the
service. This label specifies only one net-attach-def in same namespace of the Service/Pods, hence Service/Network
Attachment Definition/Pods should be in the same namespace. This annotation only allows users to specify one network,
hence one Service VIP should match one of service/network mappings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear to me how the VIP is choosen and who assignes it. As mentioned before, I think it would make sense not to intercept the SDN service ip range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multus service uses Kubernetes service controller to choose VIP from Kubernetes service cluster IP CIDR range. So VIP is chosen by Kubernetes service controller as usual Kubernetes service does. But as I mentioned, user specify the label service-proxy-name to explicitly mention that the service is for multus, hence the service is not used for Kubernetes service.

enhancements/network/multus-services.md Outdated Show resolved Hide resolved
enhancements/network/multus-services.md Outdated Show resolved Hide resolved
- Ingress and other functionalities related to Kubernetes service (this will be addressed in another enhancement).
- Network verification for accesssing services (i.e. users need to ensure network targets are reachabile).
- Services for universal forwarding mechanisms (e.g DPDK Pods are not supported), we only focus on the pods which uses Linux network stack (i.e. iptables).
- Headless service (need to more discussion in community for its design)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to do* more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe why this needs more discussion? This seems like the easiest service type on the surface (since it simply exposes those endpoints).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton It seems to be the easiest from implementation point of view (because we don't need multus-proxy), however actually not.

The headless service has big assumption, 'all client of the service should uses Kubernetes cluster DNS (e.g. coredns). Kubernetes cluster service, of cource, all pod should uses coredns, however, multus network, which may be isolated from Kubernetes network) may have other clients (e.g. PCs and some servers) and these clients may uses different DNS. Hence for headless service for multus, we may need a way to connect coredns from outside of the cluster. That is the reason why we don't support immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enhancements/network/multus-services.md Outdated Show resolved Hide resolved
enhancements/network/multus-services.md Outdated Show resolved Hide resolved
enhancements/network/multus-services.md Outdated Show resolved Hide resolved
enhancements/network/multus-services.md Outdated Show resolved Hide resolved
enhancements/network/multus-services.md Show resolved Hide resolved
- The SDN solution provides forwarding plane for Kubernetes service (i.e. Cilium, ovn-kubernetes).
- The SDN solution does not recognize the well-known label, `service.kubernetes.io/service-proxy-name`.

Hence we should mention which SDN solution supports this enhancement in documents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and hopefully make Issues to networking plugins that do not respect the label. :)

Copy link
Contributor Author

@s1061123 s1061123 Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, but it could be the future work (i.e. not for the 4.10 release) because we also add routing table entry for multus service VIP in pod network namespace, hence the packet must goes to multus network, not cluster network.

Without support, the cluster network (e.g. openshift-sdn) may contain unused flow for multus service.

enhancements/network/multus-services.md Outdated Show resolved Hide resolved

### Non-Goals

Due to the variety of service types, this must be handled in a phased approach. Therefore this covers the initial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd specify that the intention is to provide the service abstraction only to traffic that is being originated by pods (i.e. hitting the service from outside the cluster wont work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that it is mentioned in 'Service Reachability' section. In addition, "Cluster IP" is only for cluster in Kubernetes context, hence I don't mentioned that is only for cluster. It is same (but of course, multus service VIP can be accessible from the pods which has target net-attach-def, as I noted in 'Service Reachability' section.

- NodePort feature (still in discusssion among upstream community).
- Ingress and other functionalities related to Kubernetes service (this will be addressed in another enhancement).
- Network verification for accesssing services (i.e. users need to ensure network targets are reachabile).
- Services for universal forwarding mechanisms (e.g DPDK Pods are not supported), we only focus on the pods which uses Linux network stack (i.e. iptables).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each of the topics here of non goals, I'd like a brief description of the why it wouldn't be supported (one or two sentences, perhaps down below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even a sub bullet, but each of the non-goals needs a clearly articulated reason for not being a goal ("this is hard because of X which is a fundamental open design question" or "there are multiple possible ways to do Y and it is not clear which one is best") along with the overall "we believe the core of this use case is valuable on its own, and as it is opt in and using an existing API, we can articulate why it works"

Copy link
Member

@dougbtv dougbtv Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, here's a first pass at expanding upon the non-goals, @s1061123

- Provide all types of Kubernetes service functionality for secondary network interface.
  - In order to focus on quality of an initial subset of service functionality.
- Provide the exact same semantics for Kubernetes service mechanisms.
  - Which would increase the scope substantially for an initial feature.
- LoadBalancer features for pods' secondary network interface 
  - The approach for which requires more discussion in upstream community, may use other external components.
  - Which may also handle external traffic, this enhancement is for providing service functionality from traffic originating from Kubernetes pods.
- NodePort feature
  - Still in discusssion among upstream community, and looking for input on use cases.
- Ingress and other functionalities related to Kubernetes service
  - This will be addressed in another enhancement.
- Network verification for accesssing services (i.e. users need to ensure network targets are reachabile).
  - This functionality would be separate from this enhancement and the scope is somewhat orthoganal.
  - Additionally, there is on-going upstream discussion regarding secondary networks hierarchy which may also address which networks are reachable.
- Services for universal forwarding mechanisms (e.g DPDK Pods are not supported), we only focus on the pods which uses Linux network stack (i.e. iptables).
  - This makes the scope for this feature reasonable, and we may allow for extensibility which allows for pluggable modules which handle other forwarding mechanisms.
- Headless service
  - Due to assumptions of reachability to coreDNS, which may not be available to clients on isolated networks.

Copy link
Contributor Author

@s1061123 s1061123 Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton 'Non goal' is not non-goal forever, just for current planned release. Currently listed items in non-goal is 'not discussed yet/not designed yet/not implemented yet' and not 'not supported forever'. As you know, Kubernetes service functionality is diverse hence it needs a so long time to discuss about each items in community. I believe that 'Cluster VIP/load-balancing for VIP' could be minimum viable functionality for the service and even this functionality is required by multus user I believe.

If you suppose we should have clear design/implementation for the all Kubernetes service functions (e.g. dual stack/external load-balancing/so on) and bring it in one shot (as old fashioned waterfall model), then I am okey to close this PR and bring it back to community and try to discuss (but let you know that it might take a pretty long time, more than 1 year).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougbtv thank you for the draft! Changed.

is updated/removed, then multus service controller changes/removes the corresponding endpoitslices to track the
changes.

3. Multus proxy generates iptables rules for the service in Pod's network namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really seems like a gap in kube-proxy. Why can't another controller create EndpointSlices (the multus-controller) but let the default kube-proxy implement that behavior? Having to duplicate the proxy seems like a poor structure in Kube.

If there is a long term limitation that justifies the proxy duplication, it should be called out here (this is roughly a design tradeoff between "kube should let you inject endpoint slices via a custom controller that the service-proxy handles" and "when service-proxy can't handle the job of defining connectivity it must be duplicated"). We should always call out those tradeoffs explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton I'm not clear your comment. In multus service, multus-service controller creates EndpointSlices and multus-proxy (i.e. not default kube-proxy) implement the behavior. EndpointSlices may have a label (which is copied from service), service.kubernetes.io/service-proxy-name (which is defined in https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/apis/well_known_labels.go#L22) and the label specifies which proxy takes care of the endpoints.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2021
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 24, 2021
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 1, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Dec 1, 2021
@s1061123
Copy link
Contributor Author

/reopen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2022

@s1061123: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot reopened this Feb 13, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2022

@s1061123: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 4a7e01c link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dhellmann
Copy link
Contributor

@s1061123: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 4a7e01c link true /test markdownlint
Full PR test history. Your PR dashboard.

Please update the doc to match the new template, including adding the missing sections and a link to the Jira ticket or other tracking artifact in the header.

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Feb 22, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants