Deploy Kubernetes-nmstate with openshift #161

SchSeba · 2019-12-18T12:01:45Z

Signed-off-by: Sebastian Sch sebassch@gmail.com

This change is

squeed · 2019-12-18T12:34:02Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Bond creation
+
+Be able to create bond interfaces on OpenShift nodes.


I'd like to see some examples where MCO + Ignition isn't sufficient. I know there are some, but I'd like to see them spelled out.

Added one example. Let me know if you'd like more?

squeed · 2019-12-18T12:42:32Z

This is deeply bound with node-level configuration and management. Hence:
/cc @sdodson
/cc @abhinavdahiya
/cc @crawford

There needs to be something that couples node-level network configuration and MachineSets. MachineSets are how we've chosen to model non-homogenous host-level configuration in OpenShift. Anything that diverges from that model makes me worried about increased support burden and general complexity.

As much as possible, we do not want to have one-off "special" machines. When we do, we need to model that state correctly. I understand the reality of bare-metal does not always match that ideal. Nevertheless, we should try to conform as much as possible to the existing semantics, and expand those as needed rather than forging a completely new direction.

squeed · 2019-12-18T12:44:06Z

My first proposal would be to install nmstate in read-only mode, and to manage interface creation via MCO installing dropins in /etc/NetworkManager/conf.d through MachineSets

sdodson · 2019-12-18T14:47:55Z

/cc @eparis
I know he mentioned some interest and strategy around nmstate during the last planning session.

phoracek · 2019-12-18T14:52:01Z

enhancements/kubernetes-nmstate-integration.md

@@ -0,0 +1,124 @@
+---
+title: Kubernetes-NMstate


Please use kubernetes-nmstate

phoracek · 2019-12-18T14:53:43Z

enhancements/kubernetes-nmstate-integration.md

+status: 
+---
+
+# Kubernetes nmstate


kubernetes-nmstate

phoracek · 2019-12-18T14:53:57Z

enhancements/kubernetes-nmstate-integration.md

+
+## Summary
+
+A proposal to deploy Kubernetes nmstate on OpenShift.


kubernetes-nmstate

also, please link the U/S project https://github.com/nmstate/kubernetes-nmstate/

phoracek · 2019-12-18T14:54:46Z

enhancements/kubernetes-nmstate-integration.md

+
+### Goals
+
+- Deploy Kubnetes-nmstate as part of openshift


kubernetes-nmstate

phoracek · 2019-12-18T14:55:02Z

enhancements/kubernetes-nmstate-integration.md

+
+## Proposal
+
+A new Kubernetes-nmstate handler DaemonSet is deployed in the cluster part of the OpenShift installation.


kubernetes-nmstate

phoracek · 2019-12-18T14:57:08Z

enhancements/kubernetes-nmstate-integration.md

+User configures host network changes and apply a policy in `NodeNetworkConfigurationPolicy` custom
+resource. Network topology is configured via `desiredState` section in `NodeNetworkConfigurationPolicy`.
+Multiple `NodeNetworkConfigurationPolicy` custom resources can be created, Kubenetes-nmstate DaemonSet will merge
+all the `NodeNetworkConfigurationPolicy` custom resources according to priority definition and


Not true anymore. We completely split the responsibilities. NNCP is only for configuration, NNS only for reporting.

phoracek · 2019-12-18T14:58:50Z

enhancements/kubernetes-nmstate-integration.md

+
+### Implementation Details
+
+The proposal introduces Kubernetes-nmstate as Tech Preview.


kubernetes-nmstate

phoracek · 2019-12-18T15:01:52Z

enhancements/kubernetes-nmstate-integration.md

+
+### Graduation Criteria
+
+Initial support for Kubernetes-nmstate will be Tech Preview


kubernetes-nmstate

phoracek · 2019-12-18T15:01:58Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Tech Preview
+
+- Kubernetes-nmstate can be installed via container image


kubernetes-nmstate

enhancements/kubernetes-nmstate-integration.md

zshi-redhat · 2019-12-19T05:03:40Z

enhancements/kubernetes-nmstate-integration.md

+Kubenetes-nmstate Daemon verify the correctness of `NodeNetworkState` custom resource and
+apply the selected profile to the specific node.
+
+### User Stories


There is a use case where user doesn't want to assign SR-IOV VFs to a SR-IOV Pod, instead they'd like to use macvlan on top of VFs. Do you think kubernetes-nmstate can be used in such case to manage this kind of VFs once they are created by SR-IOV Operator? By managing VFs, I mean configuring network attrributes such as vlan, mtu etc on that VF device (which, in this case, can be considered as a host device) .

AFAIK nmstate does not support configuring these attributes of VF. @EdDev? Not sure whether there is a support to configure VF parameters through NetworkManager.

A VF will eventually become a regular interface, so you could define whatever you want on it.
In addition, nmstate/nmstate#648 extends PF and VF definition to expose other capabilities.

If the PF or VF is defined by a different tool (CNI?), nmstate will "see" it but will consider it "down". You can take control over them using nmstate and define whatever you want.

Thanks for the linked PR for supporting PF/VF capabilites via nmstate.

Current SR-IOV Operator is responsible for VF provisioning (creating number of VFs on the host) and manages SR-IOV sub-components, such as SR-IOV CNI and SR-IOV Device Plugin. but we are adding VF index support in SR-IOV Operator to allow it only manages sub-set of VFs device from PF. This, once merged, will leave the rest of VFs on the same PF become un-managed where I see kubernetes-nmstate can fit and take over the management of the rest VFs.

SR-IOV CNI managed via SR-IOV Operator contains the capability to configure VF properties such as spoof check, trusted VF, MAC address, link state and tx_rate etc which are the same as Kubernetes-nmstate. But it will only be used when VF is requested and attached to a Pod. I think this is a clear dividing line that we shall keep in mind going forward between use of kubernetes-nmstate and SR-IOV Operator for configuring VF properties. If it's for Pod VF configuration, then SR-IOV CNI shall be used. If it's for host-level VF configuration that may be used for other purpose instead of directly used in SR-IOV Pod, then Kubernetes-nmstate shall be used.

Several questions regarding provisioning VFs (setting number of VFs ):

Is Kubernetes-nmstate also responsible for driver binding such as vfio-pci? or will it be ?

When provision VFs for Mellanox card, does it support confguring NIC firmware to set the max number of total_numvfs? Further to this, do you see Kubernetes-nmstate can be extended to do vendor specific config, such as changing link type (ethernet or infiniband) for mellanox card?

Is Kubernetes-nmstate also responsible for driver binding such as vfio-pci? or will it be ?

DPDK and 3rd party interfaces have been discussed as part of OpenStack requirements.
The discussions and requirements have not materialized into any plan yet, mainly because there was not enough demand for it.
If you are interested in exposing and managing this through nmstate, I suggest pushing it through the regular channels to raise its priority.
I hope we will implement a pluggable provider interface in nmstate in the near future (https://nmstate.atlassian.net/browse/NMSTATE-262), then one can develop and expose custom data using plugins (which can later be embedded into the formal support of nmstate).

When provision VFs for Mellanox card, does it support confguring NIC firmware to set the max number of total_numvfs?

Current implementation uses NetworkManager as the provider to change these. If it supports it, nmstate and knmstate supports it.
If it does not support it, a BZ needs to be opened in front of NM.
(in special cases, nmstate can extend missing parts in NM as well. We do so already.)

Further to this, do you see Kubernetes-nmstate can be extended to do vendor specific config, such as changing link type (ethernet or infiniband) for mellanox card?

Seems to fit the discussions we had with OpenStack on 3rd party interfaces.
If the vendor HW fits an existing type, like ethernet, then nmstate can expose an additional subtree for specific capabilities of the interface.
As with my DPDK answer, I hope the plugin option will allow vendors specific config could be exposed easily by writing a plugin and later to consider it for full nmstate support.

As nmstate is a library, you can use it from whatever operator that fits your need.
IMO, implementing the "how" again in another code base is wasteful, but I may not see the whole picture.

Thanks for the advice! The fact I'm having this question is SR-IOV Operator has implemented VF provisioning functions & GA the support in 4.3 and I didn't see an equivalent in nmstate at this point. But I'm seeing nmstate as a project to converge the implementation of VF configuration/provosioning in future releases on OpenShift. Until we get to it, there might be issues that we need to pay attention to for co-existance of VF provision functions in both nmstate and sriov operator. One example I can think of is that if user configures number of VFs(with different number) from both Operators which all work in declarative ways, then node enters a infinite loop of being cordoned (SR-IOV Operator cordons the node before re-provisioning VFs).

If the declerative nature of the nmstate api fits your needs, I would recommend investing the effort there and gain the shared support advantage.

Btw, What is the use case that user would use NMstate to provision VFs?

For example, one can define a RED network on a host and assign it to a pool of VF/s that connect to the same network. Then, a VM comes up and requests to connect to the RED network and the hypervisor will provide a VF from the relevant pool.

Ack, I think we use different way to define and manage the VF resource pools on Kubernetes. But that doesn't affect us using the same nmstate library for host-level VF provisioning and configration.

Applications may also take advantage of VF/s for data path acceleration, but these scenarios are less common.

Can we list this as one user story in favor of supporting customer case?
For example, Manage/Configure host SR-IOV network device, this may include configuration of vlan, mtu on VF device on the host.

Done @zshi-redhat can you please take another look?

Hi @SchSeba, the change looks good to me.
Nit: manage -> managed. vlans, mtu drive <- do you mean driver here?

you are right my bad thanks!

@SchSeba @zshi-redhat can you modify the PR to include the VF-related use case? The discussion above is quite long and involves plenty of implementation and calendar concerns, so I ended up not understanding the motivation. Do you want to create the VF? Only set its vlan/mac/mtu/ip? Why do you want to do that rather than use the PF?

zshi-redhat · 2019-12-19T05:05:52Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Bond creation
+
+Be able to create bond interfaces on OpenShift nodes.


Is kubernetes-nmstate also responsible for assigning static IP address or request dhcp address for the bond interface?
This is useful when PTP with UDPv4 or UDPv6 mode is configured on that bond interface.

It can setup both static and dynamic IP.

Cool, I assume it supports both IPv4 and IPv6?

zshi-redhat · 2019-12-19T05:11:33Z

enhancements/kubernetes-nmstate-integration.md

+
+### Non-Goals
+
+- Replace SRIOV operator


Thanks for spelling out this as a non-goal :-)
When both SR-IOV Operator and Kubernetes-nmstate are deployed. Is there a mechnism to guarantee that device managed by SR-IOV Operator or Kubernetes-nmstate is not mis-configured by the other?

kubernetes-nmstate will touch interfaces only when you explicitly ask it too. The only SR-IOV related feature on nmstate is setting number of VFs AFAIK. So unless somebody creates a Policy changing that, there should be no conflict.

IIUC, Kubernets-nmstate will be able to take over the control of VF devices even if the provision of VFs is done by SR-IOV Operator.

If the use want to create a policy that change the VF configuration we don't block it right now.

squeed · 2019-12-19T10:59:05Z

Handing this off, as I'll be OOO for a while.
/cc @knobunc

abhinavdahiya · 2019-12-19T19:20:25Z

As much as possible, we do not want to have one-off "special" machines. When we do, we need to model that state correctly. I understand the reality of bare-metal does not always match that ideal. Nevertheless, we should try to conform as much as possible to the existing semantics, and expand those as needed rather than forging a completely new direction.

machine-configuration applies desired state to a set of machines. Well I agree that we have to support special machines, we should think about, is there always going to be one special machine or there is going to be a set of special machines... if it's latter users will have to duplicate the special configuration for each machine.

machine-configuration afaik supporst special configuration, but it doesn't come free, and it requires user to take steps to achieve that, like add special label to the node, create a new machineconfigpool which select the special machines and special configuration... but forcing special configuration to be applied to a set, even if the set contains one element seems like a better way to me.

_ i think machine-configuration should make adding these new sets easy_

zshi-redhat · 2019-12-27T05:00:23Z

enhancements/kubernetes-nmstate-integration.md

+
+## Proposal
+
+A new kubernetes-nmstate handler DaemonSet is deployed in the cluster part of the OpenShift installation.


It is not clear to me how the kubernetes-nmstate is going to be deployed.
Is it an Operator deployed via Operator hub or during initial OpenShift installation?
If it runs as DaemonSet, which component will be responsible to create manifests such as CRDs?

The latest version of the proposal outlines the operator that is being developed that is in charge of that. The plan is to have it installed during initial OpenShift installation.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>

cgwalters · 2020-01-28T10:28:12Z

There needs to be something that couples node-level network configuration and MachineSets. MachineSets are how we've chosen to model non-homogenous host-level configuration in OpenShift.

I think you mean the MCO and MachineConfig(Pools) - MachineSets are a (mostly) orthogonal concept from the machineAPI.

So...on that topic. I think it does make sense to have a high level CRD for networking configuration instead of having people write NM keyfiles into MachineConfigs.

Having a custom operator for this also avoids rebooting for network changes.

However - today, our installation path documents having an administrator e.g. configure static IPs via ifcfg files (and by using NM we inherit generic keyfile support).

A high level path forward here is having the MCO own the "primary interface" - the default route; what kubelet uses. Then, any secondary interfaces could be owned by kubernetes-nmstate. These "secondary interfaces" would be used by workload pods - they don't need to be up before kubelet.

Actually in this too we need to back up - are you seeing this be a default cluster operator, or would it be part of Operator Hub?

If it's going to come with the payload it needs to be openshift/nmstate-operator or so - i.e. it needs to register as a clusteroperator etc. Or stated more simply it needs to be an operator and not just a controller.

The CRD schema is a bit odd. I think what I'd propose is something like keeping NodeNetworkConfigurationPolicy but rather than having a NodeNetworkState per node, the controller adds an annotation to the node with the version of the matching state.

Also - it would feel very natural to have a MachineConfig object include something like:

networkConfiguration: br1-eth1-policy

That way, rollout of the changes is naturally done via the MCO; we only affect one node at a time by default (to avoid breaking everything), and further we already expect admins to create and manage MachineConfigPools.

This is a potential path forward too for "non-rebooting host changes" - have another daemonset that piggybacks off the MCO's currentConfig. Or of course, the MCD could call out to nmstate as a library.

Highest level feedback: What you've done in having a "generic Kubernetes" controller for nmstate has some advantages (usable with upstream kube, may have a larger upstream) but since it's a privileged operator I think to be deployed with OpenShift it really needs to integrate coherently with the rest of the platform. You could try to keep the controller code generic and vendor it into openshift/nmstate-operator or so?

cgwalters · 2020-01-28T10:40:27Z

One thing I forgot to mention too - the MCO today (and MachineConfigs) are necessary for "things before kubelet".

It'd be really nice if components that are making changes to the host that can happen after kubelet starts only do so transiently. See also coreos/fedora-coreos-tracker#354 which is about binaries.

In this case, since the "source of truth" for state is the CRDs (which are retrieved from the apiserver/etcd), nmstate should write them to e.g. /run/NetworkManager. This ensures that there can't be any "physical" conflicts with the MCO (which should generally only write files to /etc/ and not /run). It avoids having any state "leak" across reboots.

Reconcile the document with the current implementation details. Signed-off-by: Brad P. Crochet <brad@redhat.com>

Update to latest information

Explain why MCO+Ignition may not work for bond/vlan

bcrochet · 2020-05-06T12:31:32Z

Would y'all mind taking a fresh look at this? Thanks.

/cc @squeed
/cc @sdodson
/cc @abhinavdahiya
/cc @crawford
/cc @cgwalters

yboaron · 2020-05-17T20:53:08Z

enhancements/kubernetes-nmstate-integration.md

+
+## Release Signoff Checklist
+
+- [ ] Enhancement is `implementable`


I guess we can set the implementable box

yboaron · 2020-05-17T21:06:21Z

enhancements/kubernetes-nmstate-integration.md

+Then, the namespace, RBAC, and finally the DaemonSet are applied. Custom resources of kind `NMstate`
+with other names will be ignored.
+
+A new kubernetes-nmstate handler DaemonSet is deployed in the cluster by the operator.


Maybe it worth mentioning that Operator CR spec supports filtering on which nodes the DaemonSet will be deployed.

please explain how this component relates to #344

bcrochet · 2020-05-18T11:46:41Z

/cc @squeed

squeed · 2020-05-19T10:05:08Z

enhancements/kubernetes-nmstate-integration.md

+
+* Be able to create bond interfaces on OpenShift nodes.
+
+  Typically, MCO + Ignition would be utilized. However, in a Day 1 install, that may not be possible.


Hmm, this example doesn't really make sense to me - ignition is literally the first thing, ever, run on the box. If Ignition doesn't run, then kubelet doesn't run. If kubelet doesn't run, then nobody's going to run nmstate ;-).

Yes, exactly. It is completely possible to use MachineConfig "day 1", see e.g.:
https://github.com/openshift/installer/blob/master/docs/user/customization.md#install-time-customization-for-machine-configuration
This will be rendered into Ignition and happen on the first boot.

This has been removed.

its also worth calling out that the MCO itself provides a mechanism to provide a controlled roll-out of day 2 configuration updates by allowing an admin to define configuration node pools. All the concepts to understand if a rollout is configured and safe would need to be recreated when doing an alternative mechanism to rollout node pool changes. This proposal should recognize that need.

squeed · 2020-05-19T10:18:45Z

enhancements/kubernetes-nmstate-integration.md

+An upstream API group of nmstate.io is currently used.
+
+### User Stories
+


Meta-criticisim - these aren't user stories. User stories look like this:

"As a (person), I'd like to do (high level goal)." Start with a goal, then show how your proposed solution fulfills this story. "With (feature x, y, and z), I can accomplish this.""

It's important that user stories are written without the proposed solution in mind. Ideally they come "first" in the design process. Their role is to identify real needs, rather than extant features. That way you can be sure you're designing a solution for problems, not the other way around.

An example:

I am cluster administrator with a dedicated storage network, because performance for me is critical. I need to make sure that all storage traffic uses the dedicated network, and I need to be able to schedule my Gluster pods on nodes with 10GB interfaces.
I would configure my routing tables with k8s-mnstate and (filling in the details here). The existing functionality doesn't work for me because (of all these reasons).

Make sense?

The format is just a guideline, not a fixed spec of what a user story is.
The content is what matters and it needs indeed to express the need/goal.

The needs below are focused on network specific points, like "Be able to crate a bond". It is not the intent here to explain why bonds are needed or how an admin should use it. We are at a level below that, where the need to have these stuff should be obvious.

I strongly agree with @squeed .

The most important argument this enhancement needs to make is a clear argument for why this capability is required on all OpenShift clusters by default. Nothing in this enhancement resonates in its current form explaining why this is needed univerally in OpenShift rather than as an advanced optional add-on component delivered via OLM.

squeed · 2020-05-19T10:19:26Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Assign ip address
+
+* Assign static and/or dynamic ip address on interfaces


Why do I want to do this?

You are asking why we want to control the IP on day 2?
Will something like this answer this? It may be necessary to add an IPv6 address to an existing IPv4 one. or Secondary interfaces may be defined on day 2 configuration.

squeed · 2020-05-19T10:19:37Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Create/Update/Remove network routes
+
+* Be able to Create/Update/Remove network routes for different interfaces like (bond,ethernet,sriov vf and sriov pf)


What goal does this accomplish?

Control routes for custom needs.

squeed · 2020-05-19T10:20:08Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Manage/Configure host SR-IOV network device
+
+* Be able to change host Virtual functions configuration (not managed by the sriov-operator) like vlans mtu driver etc..


Why would I want to do this?

Currently the reason is a bit vague because there is a CNI plugin in the picture.
Being able to control SRIOV in general is partially a node level resource, so it makes sense to provide the means for controlling HW at this level. One application could be to run other services on the node (daemonsets?) that need special network access or in case one wants to use VF/s as regular secondary interfaces. Another option would be to control a NIC level setting that the CNI does not.

Maybe there are more interesting scenarios.

squeed · 2020-05-19T10:21:51Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Rollback
+
+* Be able to rollback network configuration 


This would be part of a meta-user-story that applies to all of them. Something like "I want to be able to roll out changes in a staged, safe manner without reboots. I am concerned that nodes will go unreachable."

This is strongly related to the "integrate with MCO" path - if nmstate does things like roll out network configuration to multiple nodes at once that makes it wayyy too easy to take down a cluster. Plus even if it doesn't, and e.g the MCO is starting to drain+reboot one master node while nmstate applies a broken networking change to a different master that's also very problematic.

kubernetes-nmstate has an automatic rollback. If it detects that API is no longer reachable, it will rollback to a good state. This is already implemented.

@bcrochet where can we go to learn more about this rollback? how do we know it will not conflict with mco itself?

Automatic rollbacks get tricky. Have you thought about scenarios where e.g. a new network configuration lands, and then something else causes the API server to become temporarily unreachable, and then the problem becomes compounded by a rollback to a previous network configuration?

@derekwaynecarr we haven't document rollback yet, usually the u/s documentation is at https://github.com/nmstate/kubernetes-nmstate, I will add a PR there to include the info related to rollback.

About rollback details kubernetes-nmstate try to apply some configuration with some timeout so if knmstate dies the networkmanager at nodes will rollback it, then it does a pair of "probes" one is related to ping external world (we do that by pinging default gw) and the other is realted to checking API server works fine (access to a mandatory namespace is done) if nothing related to this works for some time rollback to old config is done.

@cgwalters since something "else" has break API server you are as bad with old or new network config I suppose, even being with old at least you are at the starting point, but I am not sure, this is going to be quite different depending on what was being changed at network and what happend to API server.

squeed · 2020-05-19T10:22:15Z

enhancements/kubernetes-nmstate-integration.md

+
+## Design Details
+
+### Test Plan


And write some tests :-)

squeed · 2020-05-19T10:23:30Z

enhancements/kubernetes-nmstate-integration.md

+- kubernetes-nmstate can be installed via container image
+- Host network topology can be configured via CRDs
+
+#### Tech Preview -> GA


Questions you need to answer when adding a rich feature such as this:

How CEE be trained?

How will this be documented?

How will it be monitored? What data should be provided to Telemetry?

squeed · 2020-05-19T10:27:20Z

enhancements/kubernetes-nmstate-integration.md

+
+### Upgrade / Downgrade Strategy
+
+### Version Skew Strategy


This needs to be filled out as well. How will the configuration change? What are the components affected here? What are the APIs between them? Are these versioned? This goes far beyond defining a CRD.

How tightly is this coupled to the version of NetworkManager? To the kernel? Will this work on RHEL7? Will users need to care? What if NetworkManager deprecates a configuration directive? Adds a new one?

How tightly is this coupled to the advanced CNI plugins? SR-IOV, OVN-Kubernetes, etc? Remember, component upgrades are not strictly ordered.

There is a single strong dependency and that is between kubernetes-nmstate and the NetworkManager running on the host. The communication happens through D-Bus. NetworkManager API is backwards compatible, which makes kubernetes-nmstate forward compatible. When we release, we support certain version of NetworkManager and everything after it. nmstate release cycle (not kubernetes-nmstate) is tightly bound to RHEL and NetworkManager version, so the only thing we need to ensure on kubernetes-nmstate here, is that we don't push a version that would be newer than NetworkManager currently available on RHCOS.

There should be no coupling with kernel version apart from that done by NetworkManager.

This will not work on RHEL7 since RHEL7 has old versions of NetworkManager. Support for RHEL7 is not on nmstate roadmap.

Users don't need to care. We just need to make sure to release only nmstate that is compatible with RHCOS/RHEL supported by the current version of OpenShift (with the exception of RHEL7). Our operator will not deploy kubernetes-nmstate on RHEL7 nodes.

In case there is a breaking change on NetworkManager side, we can tackle it in nmstate or kubernetes-nmstate with a workaround. However, since there is guaranteed backward compatibility, it would be a NetworkManager bug.

It is not coupled with any CNI. kubernetes-nmstate is just controlling configuration making sure to reach desired state. It won't intervene into any CNI configuration unless it is asked to.

squeed · 2020-05-19T10:28:23Z

enhancements/kubernetes-nmstate-integration.md

+
+Tech Preview
+
+## Infrastructure Needed


What testing infrastructure will you need? Special hardware? Special kernels? Will this have CI coverage, or will it have to be entirely manual?

squeed · 2020-05-19T10:28:56Z

enhancements/kubernetes-nmstate-integration.md

+
+### Goals
+
+- Deploy kubernetes-nmstate as part of openshift


This isn't a goal. This is a solution. What's the high-level goal?

This document goal is to deploy k-nmstate as part of openshift with the motivation expressed above. It describes below the solution (use operators etc...).
I guess it is possible to specify in the goal something like Manage day 2 node networking in an openshift cluster, which summarizes the motivation, then say that installing k-nmstate is the solution and the rest is an implementation.

IMO it is borderline and both seem ok to me.

I agree that this defines a solution and not a goal. For those less familiar with nmstate, it would be good to explain why nmstate is the right solution to addressing a goal and if there are others evaluated.

@bcrochet we could add a section that gives background about nmstate being a declarative interface to Enterprise Linux's network interface configuration tool of choice (NetworkManager).

cgwalters · 2020-05-29T21:50:42Z

enhancements/kubernetes-nmstate-integration.md

+* Be able to create bond interfaces on OpenShift nodes.
+
+  Typically, MCO + Ignition would be utilized. However, in a Day 1 install, that may not be possible.
+  It may be necessary for the bond/vlan configuration be done before Ignition can even be reached.


A whole lot of discussion around this in #210 and #291 and coreos/ignition#979

The TL;DR on this is that the RHCOS Live ISO will fully support automated and interactive arbitrarily complex networking configurations.

I don't understand how adding nmstate as is would help with any of this.

I've since removed that blurb. We are not targeting day 1 ops for this anymore.

cgwalters · 2020-05-29T21:56:38Z

Would y'all mind taking a fresh look at this? Thanks.

I'm not seeing much changed that addresses my comments in #161 (comment)
?

Happy to do a real-time meeting on this if it helps.

derekwaynecarr

A few observations prior to merge:

Why must this feature exist on every OpenShift installation by default?
Why is OLM not a suitable vehicle to deliver this function?
How is utilization of the feature opt-in as all TechPreview function must be?
A cluster that has used TechPreview function should gate upgrade across minor versions, how does this handle this issue?
Why is a more granular unit of atomic change required and how is an admin supposed to understand it relative to the MCO? What parity will exist with MCO to understand if a configuration has been rolled out across a pool of hosts?

I am not convinced that this is a required capability intrinsic to OpenShift rather than an advanced add-on. Please explain why this should be intrinsic to OpenShift.

/hold

derekwaynecarr · 2020-06-01T15:28:15Z

enhancements/kubernetes-nmstate-integration.md

+
+## Proposal
+
+A new kubernetes-nmstate operator Deployment is deployed to the cluster as part of the the OpenShift


as a new network operator or as part of the existing network operator? why is this required on every openshift cluster versus just opt-in to some clusters where the advanced capability is required? this document should explain why 100% of OpenShift clusters require this capability

I think the core problem here is that like a number of other things, nmstate is mostly irrelevant in cloud environments, but really useful for bare metal. Another exampe is the extension system where usbguard is silly in public cloud, but highly relevant in a world in which plugging in a malicious USB stick to a physical Linux machine can gain arbitrary code execution.

agree that this doesn't seem to be required on every cluster.

If we had a way to say "install this by default on bare metal", I'd recommend adding it to that list. An operator that's a no-op on cloud clusters doesn't seem ideal, either. It seems like a OLM managed operator is the only choice we have, unless in some future we can have a bare metal CVO profile or something like that.

derekwaynecarr · 2020-06-01T15:30:25Z

enhancements/kubernetes-nmstate-integration.md

+Then, the namespace, RBAC, and finally the DaemonSet are applied. Custom resources of kind `NMstate`
+with other names will be ignored.
+
+A new kubernetes-nmstate handler DaemonSet is deployed in the cluster by the operator.


please explain how this component relates to #344

derekwaynecarr · 2020-06-01T15:33:31Z

enhancements/kubernetes-nmstate-integration.md

+An upstream API group of nmstate.io is currently used.
+
+### User Stories
+


I strongly agree with @squeed .

The most important argument this enhancement needs to make is a clear argument for why this capability is required on all OpenShift clusters by default. Nothing in this enhancement resonates in its current form explaining why this is needed univerally in OpenShift rather than as an advanced optional add-on component delivered via OLM.

derekwaynecarr · 2020-06-01T15:35:48Z

enhancements/kubernetes-nmstate-integration.md

+
+* Be able to create bond interfaces on OpenShift nodes.
+
+  Typically, MCO + Ignition would be utilized. However, in a Day 1 install, that may not be possible.


its also worth calling out that the MCO itself provides a mechanism to provide a controlled roll-out of day 2 configuration updates by allowing an admin to define configuration node pools. All the concepts to understand if a rollout is configured and safe would need to be recreated when doing an alternative mechanism to rollout node pool changes. This proposal should recognize that need.

derekwaynecarr · 2020-06-01T15:37:13Z

enhancements/kubernetes-nmstate-integration.md

+
+#### Rollback
+
+* Be able to rollback network configuration 


@bcrochet where can we go to learn more about this rollback? how do we know it will not conflict with mco itself?

derekwaynecarr · 2020-06-01T15:40:39Z

enhancements/kubernetes-nmstate-integration.md

+
+### Implementation Details
+
+The proposal introduces kubernetes-nmstate as Tech Preview.


all tech preview features must have opt-in. this includes opt-in to installation of any operator that provides the capability. how is opt-in handled in this enhancement if from what i can gather it makes the capability a universal capability of all openshift installs. having a feature dev/tech preview get delivered via opt-in through OLM is much simpler.

I think it is reasonable to do OLM so that it would be opt in.

openshift-ci-robot · 2020-06-01T15:44:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SchSeba
To complete the pull request process, please assign joelanford
You can assign the PR to them by writing /assign @joelanford in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Also mark as implementable

celebdor · 2020-06-02T13:57:27Z

A high level path forward here is having the MCO own the "primary interface" - the default route; what kubelet uses. Then, any secondary interfaces could be owned by kubernetes-nmstate. These "secondary interfaces" would be used by workload pods - they don't need to be up before kubelet.

It is common for users to want to modify things on the primary interface and we should be able to reconfigure it without important breakage. For example, one could want to add vlans on the same interface for other kinds of traffic that go over the same interface, or bind it to another interface for failover purposes.

I see the possibility for conflict between the static MCO iface configuration and knmstate dynamic configuration. To tackle this, rather than limiting by interface we should have a handover mechanism. One example could be that knmstate, when being told to configure an interface that is presently configured by a MachineConfig, it could just refuse to apply the changes and point that out in the status. Another alternative would be for knmstate to interact with MCO to undo the MachineConfig preventing a pivot.

Actually in this too we need to back up - are you seeing this be a default cluster operator, or would it be part of Operator Hub?

I think this would be generally useful as a default cluster operator. Specially for platforms that support multiple interfaces.

If it's going to come with the payload it needs to be openshift/nmstate-operator or so - i.e. it needs to register as a clusteroperator etc. Or stated more simply it needs to be an operator and not just a controller.

https://github.com/openshift/kubernetes-nmstate hosts both the controller and the operator. The operator handles deploying the daemonset that watches for the Custom Resources.

The CRD schema is a bit odd. I think what I'd propose is something like keeping NodeNetworkConfigurationPolicy but rather than having a NodeNetworkState per node, the controller adds an annotation to the node with the version of the matching state.

Also - it would feel very natural to have a MachineConfig object include something like:

networkConfiguration: br1-eth1-policy

That way, rollout of the changes is naturally done via the MCO; we only affect one node at a time by default (to avoid breaking everything), and further we already expect admins to create and manage MachineConfigPools.

Kubernetes-NMState has its own rollback mechanism that undoes the networking changes if it loses communication to the API.

Do I understand it right that the above proposes that the Kubernetes-NMState component in a given node would see the changes in NodeNetworkConfigurationPolicy but only trigger the change once it saw the node MachineConfig point to it? Or is it that in order to reuse the rollout mechanism we'd modify MCO local node components to fire the network change just like it does the pivot (but dynamically).

This is a potential path forward too for "non-rebooting host changes" - have another daemonset that piggybacks off the MCO's currentConfig. Or of course, the MCD could call out to nmstate as a library.

So it's the second option from what I wrote above, MCD triggering nmstate.

Highest level feedback: What you've done in having a "generic Kubernetes" controller for nmstate has some advantages (usable with upstream kube, may have a larger upstream) but since it's a privileged operator I think to be deployed with OpenShift it really needs to integrate coherently with the rest of the platform.

Kubernetes NMState is being used by upstream Kubevirt, so upstream Kubernetes usage is very important and we've shaped the proposal trying to keep the divergence minimal.

You could try to keep the controller code generic and vendor it into openshift/nmstate-operator or so?

That's what we have done. The controller code is generic and is in openshift/kubernetes-nmstate. The operator code is there as well and can make changes that are Openshift specific (the operator lives upstream in its generic, unadulterated form.

One thing I forgot to mention too - the MCO today (and MachineConfigs) are necessary for "things before kubelet".

We are well aware of those. It's what we use currently for configuring networking for a few of platforms (OpenStack, Bare-metal, vSphere, oVirt). A follow-up to this enhancement proposal would be to be able to define NodeNetworkConfigurationPolicy that is set in MachineConfig and passed in nmstate consumable shape all the way to the first ignition. This would mean bringing a new network section to ignition, but I think it is totally worth it to have coherent network configuration. The obvious problem with that is that there may be configuration that people need to do for interfaces that are not supported by nmstate, but I guess there could be some inhibition mechanism.

It'd be really nice if components that are making changes to the host that can happen after kubelet starts only do so transiently. See also coreos/fedora-coreos-tracker#354 which is about binaries.

That would probably be doable and IMHO, it is acceptable until we have some of that machineconfig/ignition integration I described above.

In this case, since the "source of truth" for state is the CRDs (which are retrieved from the apiserver/etcd), nmstate should write them to e.g. /run/NetworkManager. This ensures that there can't be any "physical" conflicts with the MCO (which should generally only write files to /etc/ and not /run). It avoids having any state "leak" across reboots.

I would tackle the conflicts via the "API convergence" I talked about above or preventing the machineconfig ifaces from being touched. Having connection changes be ephemeral is not really supported by NetworkManager and enabling that via nmstate and Kubernets-NMstate is a significant effort probably best employed in the convergence.

crawford

Overall, I'm unswayed by this proposal. This appears to have been written with the solution already in mind and leaks far to much of the underlying implementation in the API (e.g. using the term NMstate). It's unclear to me what specific problems this is going to solve and it's even more unclear how this fits into the larger story of machine configuration. In an effort to avoid just tacking on more complexity, I'm putting a hold on this effort until we can answer these (and others') questions.

/hold

crawford · 2020-06-02T17:34:51Z

enhancements/kubernetes-nmstate-integration.md

+
+### Goals
+
+- Dynamic network interface configuration for OpenShift on Day 2 via an API


This already exists today, so you're going to need to be more specific about the problem that you are trying to solve.

What is providing dynamic (not taking the node down) network interface configuration after installation finishes?

The Machine Config Operator (specifically the Machine Config Daemon) is the thing that applies configuration change today. From the cluster's perspective, that's dynamic reconfiguration. It sounds like you are concerned about the individual node, so you'll have to help me understand why that's important. If it's a question of performance, then I'd argue that the MCD is the place to work on those optimizations (that design has started in #159).

To me one of the biggest pieces of value this can offer is a domain specific API for something that can be pretty complex. Managing it with direct config files via MachineConfig isn't ideal, so I think it's worth thinking about what a Network config management API might look like.

I was concerned about how this would fit well into OCP 4 since it was originally developed completely disconnected from MCO, but it seems there are some ideas forming to help address that.

danwinship · 2020-06-15T14:36:49Z

Overall, I'm unswayed by this proposal. This appears to have been written with the solution already in mind and leaks far to much of the underlying implementation in the API (e.g. using the term NMstate). It's unclear to me what specific problems this is going to solve and it's even more unclear how this fits into the larger story of machine configuration.

Yeah, so I think some of the background is:

many bare-metal customers want complex networking setups
we don't want to go back to the OCP 3.x-era fun where admins can do arbitrarily insane things to their network config and everything else is supposed to somehow cope with it
so, we need a system that lets people configure their network in more complicated, but still constrained ways, and which also lets support understand at a glance exactly what they've done

russellb · 2020-06-16T20:15:29Z

That's a pretty good summary, @danwinship . I think this is about exploring a host networking domain specific API that makes things easier to understand and manage instead of going straight to MachineConfig.

It doesn't seem ideal to have this completely disconnected from MCO / MCD. I know that's been covered in some discussion already and @celebdor told me an update would be coming with a proposal for closer integration here. I look forward to that.

On a housekeeping note, since this enhancement has been up for a while, it would be helpful to mark resolved all past conversations that have been addressed so it's more clear which topics are still under discussion.

russellb · 2020-08-13T19:49:57Z

/close

AFAICT, the functionality pursued in this enhancement is now covered by #399, instead. Ongoing discussion should happen there.

openshift-ci-robot · 2020-08-13T19:50:14Z

@russellb: Closed this PR.

In response to this:

/close

AFAICT, the functionality pursued in this enhancement is now covered by #399, instead. Ongoing discussion should happen there.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 18, 2019

openshift-ci-robot requested review from kbsingh and tbielawa December 18, 2019 12:02

squeed reviewed Dec 18, 2019

View reviewed changes

openshift-ci-robot requested review from abhinavdahiya, crawford and sdodson December 18, 2019 12:42

openshift-ci-robot requested a review from eparis December 18, 2019 14:47

phoracek suggested changes Dec 18, 2019

View reviewed changes

zshi-redhat reviewed Dec 19, 2019

View reviewed changes

SchSeba force-pushed the kubernetes_nmstate branch from 9392cfc to f961c88 Compare December 19, 2019 08:31

openshift-ci-robot requested a review from knobunc December 19, 2019 10:59

SchSeba force-pushed the kubernetes_nmstate branch from f961c88 to 3866a11 Compare December 19, 2019 12:38

zshi-redhat reviewed Dec 27, 2019

View reviewed changes

SchSeba force-pushed the kubernetes_nmstate branch from 3866a11 to 1a26c0c Compare December 30, 2019 09:59

Deploy Kubernetes-nmstate with openshift

cae7845

Signed-off-by: Sebastian Sch <sebassch@gmail.com>

SchSeba force-pushed the kubernetes_nmstate branch from 1a26c0c to cae7845 Compare December 31, 2019 08:36

bcrochet and others added 4 commits April 22, 2020 09:01

Update to latest information

e66229f

Reconcile the document with the current implementation details. Signed-off-by: Brad P. Crochet <brad@redhat.com>

Merge pull request #1 from bcrochet/kubernetes-nmstate

5123b5b

Update to latest information

Explain why MCO+Ignition may not work for bond/vlan

e16326a

Merge pull request #2 from bcrochet/kubernetes-nmstate

261b42b

Explain why MCO+Ignition may not work for bond/vlan

openshift-ci-robot requested review from cgwalters and squeed May 6, 2020 12:31

yboaron reviewed May 17, 2020

View reviewed changes

squeed reviewed May 19, 2020

View reviewed changes

squeed suggested changes May 19, 2020

View reviewed changes

cgwalters reviewed May 29, 2020

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2020

derekwaynecarr requested changes Jun 1, 2020

View reviewed changes

bcrochet added 2 commits June 2, 2020 09:50

Add information about filtering nodes

98bec32

Also mark as implementable

Expand on user stories and answer additional questions

1d0ceda

crawford reviewed Jun 2, 2020

View reviewed changes

russellb mentioned this pull request Jul 21, 2020

[machine-config-operator/baremetal] MCO declarative network configuration #399

Merged

openshift-ci-robot closed this Aug 13, 2020

hardys mentioned this pull request Apr 22, 2021

Enable Kubernetes NMstate by default for selected platforms #747

Closed


		#### Bond creation

		Be able to create bond interfaces on OpenShift nodes.


		## Summary

		A proposal to deploy Kubernetes nmstate on OpenShift.


		## Proposal

		A new Kubernetes-nmstate handler DaemonSet is deployed in the cluster part of the OpenShift installation.


		### Implementation Details

		The proposal introduces Kubernetes-nmstate as Tech Preview.


		### Graduation Criteria

		Initial support for Kubernetes-nmstate will be Tech Preview


		#### Tech Preview

		- Kubernetes-nmstate can be installed via container image


		## Release Signoff Checklist

		- [ ] Enhancement is `implementable`


		* Be able to create bond interfaces on OpenShift nodes.

		Typically, MCO + Ignition would be utilized. However, in a Day 1 install, that may not be possible.

		An upstream API group of nmstate.io is currently used.

		### User Stories


		#### Assign ip address

		* Assign static and/or dynamic ip address on interfaces


		#### Create/Update/Remove network routes

		* Be able to Create/Update/Remove network routes for different interfaces like (bond,ethernet,sriov vf and sriov pf)


		#### Manage/Configure host SR-IOV network device

		* Be able to change host Virtual functions configuration (not managed by the sriov-operator) like vlans mtu driver etc..


		## Proposal

		A new kubernetes-nmstate operator Deployment is deployed to the cluster as part of the the OpenShift


		### Goals

		- Dynamic network interface configuration for OpenShift on Day 2 via an API

Deploy Kubernetes-nmstate with openshift #161

Deploy Kubernetes-nmstate with openshift #161

Conversation

SchSeba commented Dec 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squeed commented Dec 18, 2019

squeed commented Dec 18, 2019

sdodson commented Dec 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phoracek Dec 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zshi-redhat Dec 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zshi-redhat Dec 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squeed commented Dec 19, 2019

abhinavdahiya commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters commented Jan 28, 2020

cgwalters commented Jan 28, 2020

bcrochet commented May 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcrochet commented May 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squeed May 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SchSeba commented Dec 18, 2019 •

edited

Loading

phoracek Dec 19, 2019 •

edited

Loading

zshi-redhat Dec 23, 2019 •

edited

Loading

zshi-redhat Dec 30, 2019 •

edited

Loading

squeed May 19, 2020 •

edited

Loading

derekwaynecarr left a comment •

edited

Loading