Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes CNI plugin for EVE #3795

Merged
merged 4 commits into from
Mar 15, 2024
Merged

Kubernetes CNI plugin for EVE #3795

merged 4 commits into from
Mar 15, 2024

Conversation

milan-zededa
Copy link
Contributor

@milan-zededa milan-zededa commented Mar 5, 2024

eve-bridge is a CNI plugin for Kubernetes which allows connecting
Pods with EVE Network Instances. This CNI plugin establishes
Pod connectivity in cooperation with the zedrouter microservice.
A CNI request received from Kubernetes to connect Pod to a given
network instance is delegated by eve-bridge to zedrouter via RPC
calls over an AF-UNIX socket. More details can be found in the
documentation submitted under pkg/kube/eve-bridge/README.md

Note that majority of changes in this PR are vendored deps for eve-bridge & pillar.
I have put them into separate commits (the last two).
I therefore recommend to review this PR commit-by-commit.

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 17.51%. Comparing base (af6661c) to head (6528118).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3795   +/-   ##
=======================================
  Coverage   17.51%   17.51%           
=======================================
  Files           3        3           
  Lines         805      805           
=======================================
  Hits          141      141           
  Misses        629      629           
  Partials       35       35           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@milan-zededa milan-zededa force-pushed the cni branch 4 times, most recently from 7388226 to e67cb50 Compare March 6, 2024 09:59
@milan-zededa
Copy link
Contributor Author

Some Yetus issues remain but they are either expected or do not make sense.

@milan-zededa milan-zededa marked this pull request as ready for review March 6, 2024 10:12
@eriknordmark
Copy link
Contributor

What's the impact on the EVE image size?

@milan-zededa
Copy link
Contributor Author

milan-zededa commented Mar 7, 2024

What's the impact on the EVE image size?

It seems that rootfs.img increased in size by 17MB. However, installed eve-bridge has "only" 4MB. Seems that other CNI plugins add some extra megabytes (they are all written in Go). I will remove those that we do not use.

ls -al /opt/cni/bin/
total 47596
drwxr-xr-x    2 root     root           380 Mar  7 12:00 .
drwxr-xr-x    3 root     root            60 Mar  7 12:00 ..
-rwxr-xr-x    1 root     root       2568696 Mar  7 12:00 bandwidth
-rwxr-xr-x    1 root     root       2873496 Mar  7 12:00 bridge
-rwxr-xr-x    1 root     root       7079416 Mar  7 12:00 dhcp
-rwxr-xr-x    1 root     root       4034560 Mar  7 12:00 eve-bridge
-rwxr-xr-x    1 root     root       2963256 Mar  7 12:00 firewall
-rwxr-xr-x    1 root     root       2568760 Mar  7 12:00 host-device
-rwxr-xr-x    1 root     root       2174584 Mar  7 12:00 host-local
-rwxr-xr-x    1 root     root       2654616 Mar  7 12:00 ipvlan
-rwxr-xr-x    1 root     root       2221432 Mar  7 12:00 loopback
-rwxr-xr-x    1 root     root       2679192 Mar  7 12:00 macvlan
-rwxr-xr-x    1 root     root       2493912 Mar  7 12:00 portmap
-rwxr-xr-x    1 root     root       2778904 Mar  7 12:00 ptp
-rwxr-xr-x    1 root     root       2356792 Mar  7 12:00 sbr
-rwxr-xr-x    1 root     root       1936696 Mar  7 12:00 static
-rwxr-xr-x    1 root     root       2278776 Mar  7 12:00 tuning
-rwxr-xr-x    1 root     root       2654616 Mar  7 12:00 vlan
-rwxr-xr-x    1 root     root       2381176 Mar  7 12:00 vrf

But note that this is "kubevirt" EVE. Kvm/xen EVE do not build the kube container.

@milan-zededa
Copy link
Contributor Author

Update: I have removed unused CNI plugins from the image and the rootfs size increase dropped to 7MB.
Again, this is built only for the kubevirt EVE image.

@github-actions github-actions bot requested a review from eriknordmark March 7, 2024 15:22
@eriknordmark
Copy link
Contributor

But note that this is "kubevirt" EVE. Kvm/xen EVE do not build the kube container.

Understood. But only picking the ones you need makes sense since we don't want to drive the kubevirt image size to quickly towards the 512MB partition size limit.

@eriknordmark eriknordmark requested a review from deitch March 12, 2024 17:49
@deitch
Copy link
Contributor

deitch commented Mar 13, 2024

From an architectural and documentation perspective, I have a few questions:

  • Where is the source for that diagram? Is it a graphviz file or some other textual basis, that can be included in the docs? This will make it easier for someone to modify it in the future.
  • Some of the arrows are missing numbers, so it isn't clear where they fit. It does become clear as you read the documentation and go back and forth, but it would help to have every step numbered
  • The documentation highlights (this is quite helpful), that "the first set of steps is the same as with any hypervisor", etc. If we can highlight this in the diagram, it would be helpful, perhaps in a box or a different colour? It makes it easy to see instantly, "this is not unique", and "this is special".
  • Do we have a similar diagram for containers and VMs in kvm eve? Seeing these side-by-side would really help people understand. It would be even better in a single diagram, but that may be too busy.
  • I think (not positive) that at least part of this documentation should be moved out of eve-bridge. This describes the whole kubevirt deployment process, and is so much more valuable than a doc on eve-bridge. Just the eve-bridge details belong in the directory.
  • Is it correct to sum up eve-bridge as, "CNI that delegates network interface management to zedrouter"?
  • I have not delved too deeply into the interaction between CNI and kubevirt before. Normal CNI is, "kubelet executes CNI binary, passing it network namespace; CNI plumbs directly into the namespace any networking." Is this roughly the same? What actually gets passed to the CNI, given that it is not a "normal" network namespace?
  • Do we need the "default bridge" interface? In kvm eve, each VM gets exactly the network interfaces that are configured for it based on NetworkInstance. Is this different somehow that it needs it?
  • Do we actually need multus at all? The whole purpose of multus is to call multiple CNIs (multiplexer). If you have one CNI that can handle everything - including multiple network interfaces - you don't need multus. If eve-bridge implements the CNI (which it must to be called by multus), and passes it all of the network interfaces, why do we need multus at all? Why not just call the eve-bridge CNI directly?
  • RPC: Is this the first time we are using RPC here? Not that I think pubsub is the best solution for everything, but if zedrouter already subscribes to pubsub events, would that be more consistent here? An alternative might be for eve-bridge not to talk to zedrouter at all. Can zedrouter publish the information, and eve-bridge just gets that and plumbs it into the VMs directly? I guess zedrouter already has "plumb network interfaces into VMs" logic anyways?

@milan-zededa
Copy link
Contributor Author

milan-zededa commented Mar 14, 2024

@deitch

From an architectural and documentation perspective, I have a few questions:

* Where is the source for that diagram? Is it a graphviz file or some other textual basis, that can be included in the docs? This will make it easier for someone to modify it in the future.

It is from draw.io
I have added .xml version of the diagram next to the image. It can be imported in draw.io and edited.

* Some of the arrows are missing numbers, so it isn't clear where they fit. It does become clear as you read the documentation and go back and forth, but it would help to have every step numbered

Fixed, now all arrows have numbers.

* The documentation [highlights](https://github.com/lf-edge/eve/pull/3795/files#diff-9ebd219ad1ac605851f7ef51420e5db7d1a088f28db91f78b731934d819b54efR50-R56) (this is quite helpful), that "the first set of steps is the same as with any hypervisor", etc. If we can highlight this in the diagram, it would be helpful, perhaps in a box or a different colour? It makes it easy to see instantly, "this is not unique", and "this is special".

Done, kubevirt-specific steps are highlighted with a green color.

* Do we have a similar diagram for containers and VMs in kvm eve? Seeing these side-by-side would really help people understand. It would be even better in a single diagram, but that may be too busy.

Not yet, I will add something in a separate PR.

* I think (not positive) that at least part of this documentation should be moved out of eve-bridge. This describes the whole kubevirt deployment process, and is so much more valuable than a doc on eve-bridge. Just the eve-bridge details belong in the directory.

Currently, we do not have "kubevirt-EVE" documentation since everything is still a work in-progress (and many changes are not even committed to the main EVE repo yet). Eventually, there should be documentation about this new EVE variant and this eve-bridge doc will be linked and partially moved there.

* Is it correct to sum up eve-bridge as, "CNI that delegates network interface management to zedrouter"?

Yes

* I have not delved too deeply into the interaction between CNI and kubevirt before. Normal CNI is, "kubelet executes CNI binary, passing it network namespace; CNI plumbs directly into the namespace any networking." Is this roughly the same? What actually gets passed to the CNI, given that it is not a "normal" network namespace?

Yes, correct. Actually, pod is wrapped inside a "sandbox" container, which implements the network isolation using "normal" network namespace. This network namespace is even "named", meaning that there is a link to it from /var/run/netns/ (can be used with ip netns exec <name> <cmd> <args>.... And the CNI plugin is getting the filename of the namespace link inside /var/run/netns/. Additionally, the plugin gets some info about the pod interface that it should create.

* Do we need the "default bridge" interface? In kvm eve, each VM gets exactly the network interfaces that are configured for it based on NetworkInstance. Is this different somehow that it needs it?

Kubernetes checks for the presence of this "eth0" interface and will not consider Pod as properly connected if it cannot find it in the pod's network namespace.
But note that in the case of a VMI (VM running inside Pod by kubevirt), we do not have to further propagate (bridge) this eth0 interface from the Pod into the VM. And so we only attach EVE's interfaces into the VM.
Only apps running as native containers will see this eth0 interface connected to the Kubernetes network.

* Do we actually need multus at all? The whole purpose of multus is to call multiple CNIs (multiplexer). If you have one CNI that can handle everything - including multiple network interfaces - you don't need multus. If eve-bridge implements the CNI (which it must to be called by multus), and passes it all of the network interfaces, why do we need multus at all? Why not just call the eve-bridge CNI directly?

We could in theory replicate this multus logic in eve-bridge. For multiple interfaces, eve-bridge would need more complicated config file, get pod annotations from the Kubernetes API server, delegate the "Kubernetes interface" to Flannel, etc. I wanted to keep our CNI plugin as simple as possible and use the composition of CNI plugins instead. Later we could rethink this and avoid extra dependencies if it is worth it.

* RPC: Is this the first time we are using RPC here? Not that I think pubsub is the best solution for everything, but if zedrouter already subscribes to pubsub events, would that be more consistent here? An alternative might be for eve-bridge not to talk to zedrouter at all. Can zedrouter publish the information, and eve-bridge just gets that and plumbs it into the VMs directly? I guess zedrouter already has "plumb network interfaces into VMs" logic anyways?

My primary goal was to avoid the hell of asynchronous programming, which we already have enough of in pillar. CNI plugin is supposed to create connection immediately and return only when it is done (successfully or with error). Pubsub is not practical for the request-response communication. Also, I wanted to limit the number of imports and dependencies in eve-bridge (to keep it small). Lastly, I wanted to avoid having multiple agents managing network configuration. We already have nim-zedrouter separation and it is not easy to avoid race conditions between them. It is just so much easier having zedrouter completely taking care of network instances and application interfaces connected to them.

@deitch
Copy link
Contributor

deitch commented Mar 14, 2024

It is from draw.io
I have added .xml version of the diagram next to the image. It can be imported in draw.io and edited.

Thanks. That should make it a lot easier to update going forward

Fixed, now all arrows have numbers.

Thanks

Done, kubevirt-specific steps are highlighted with a green color.

That makes it easier. Even better if the diagram or README says it.

Currently, we do not have "kubevirt-EVE" documentation since everything is still a work in-progress (and many changes are not even committed to the main EVE repo yet). Eventually, there should be documentation about this new EVE variant and this eve-bridge doc will be linked and partially moved there.

Makes sense.

Actually, pod is wrapped inside a "sandbox" container, which implements the network isolation using "normal" network namespace. This network namespace is even "named", meaning that there is a link to it from /var/run/netns/ (can be used with ip netns exec .... And the CNI plugin is getting the filename of the namespace link inside /var/run/netns/. Additionally, the plugin gets some info about the pod interface that it should create

Which makes it even easier. Thanks for explaining

Kubernetes checks for the presence of this "eth0" interface and will not consider Pod as properly connected if it cannot find it in the pod's network namespace

🤦‍♂️

Not much of a choice, then, is there.

We could in theory replicate this multus logic in eve-bridge. For multiple interfaces, eve-bridge would need more complicated config file, get pod annotations from the Kubernetes API server, delegate the "Kubernetes interface" to Flannel, etc. I wanted to keep our CNI plugin as simple as possible and use the composition of CNI plugins instead. Later we could rethink this and avoid extra dependencies if it is worth it

No, then it isn't. If the only thing we were doing was having multus delegate to eve-bridge, which is what I thought, then we could just use eve-bridge directly. Even multiple interfaces are multiple NetworkInstance, and all managed by eve-bridge anyways. But as soon as we need that eth0, we need something wiring up, now we need multiple kinds of interfaces, let's use leverage. Your approach makes sense.

My primary goal was to avoid the hell of asynchronous programming, which we already have enough in pillar.

We have async hell? Really? 🤣

CNI plugin is supposed to create connection immediately and return only when it is done (successfully or with error). Pubsub is not practical for the request-response communication. Also, I wanted to limit the number of imports and dependencies in eve-bridge (to keep it small). Lastly, I wanted to avoid having multiple agents managing network configuration. We already have nim-zedrouter separation and it is not easy to avoid race conditions between them. It is just so much easier having zedrouter completely taking care of network instances and application interfaces connected to them.

These all make sense. My concern is with the added complexity of zedrouter, which now has pubsub and RPC, and we have to know which is which. If this is simpler than sticking with the current, so be it.

@eriknordmark
Copy link
Contributor

But note that in the case of a VMI (VM running inside Pod by kubevirt), we do not have to further propagate (bridge) this eth0 interface from the Pod into the VM. And so we only attach EVE's interfaces into the VM.

Where is this "filtering" taking place?

@eriknordmark
Copy link
Contributor

My primary goal was to avoid the hell of asynchronous programming, which we already have enough in pillar.

We have async hell? Really? 🤣

And I've been (naively) thinking it was a lifestyle choice ;-)

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

golang.org/x/sys v0.13.0 // indirect
)

// TODO: Remove the replace directive once cnirpc is merged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this take place? Now or after merging this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merging this PR. Then I can properly reference EVE commit which has this package included.

@milan-zededa
Copy link
Contributor Author

But note that in the case of a VMI (VM running inside Pod by kubevirt), we do not have to further propagate (bridge) this eth0 interface from the Pod into the VM. And so we only attach EVE's interfaces into the VM.

Where is this "filtering" taking place?

It is not yet submitted to the main EVE repo. For now you can see it in the Naiming's fork: https://github.com/naiming-zededa/eve/blob/poc-dec3/pkg/pillar/hypervisor/kubevirt.go#L235-L247

From the standard set of CNI plugins, EVE will use the bridge plugin.

Signed-off-by: Milan Lenco <milan@zededa.com>
@milan-zededa
Copy link
Contributor Author

Done, kubevirt-specific steps are highlighted with a green color.

That makes it easier. Even better if the diagram or README says it.

Done (mentioned arrow coloring in the readme).

eve-bridge is a CNI plugin for Kubernetes which allows connecting
Pods with EVE Network Instances. This CNI plugin establishes
Pod connectivity in cooperation with the zedrouter microservice.
A CNI request received from Kubernetes to connect Pod to a given
network instance is delegated by eve-bridge to zedrouter via RPC
calls over an AF-UNIX socket. More details can be found in the
documentation submitted under pkg/kube/eve-bridge/README.md

Signed-off-by: Milan Lenco <milan@zededa.com>
Signed-off-by: Milan Lenco <milan@zededa.com>
Signed-off-by: Milan Lenco <milan@zededa.com>
@eriknordmark eriknordmark merged commit 0f6d1f1 into lf-edge:master Mar 15, 2024
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants