Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baremetal IPI Network Configuration for Day-1 #817

Merged
merged 4 commits into from
Oct 27, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 289 additions & 0 deletions enhancements/network/baremetal-ipi-network-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
---
title: baremetal-ipi-network-configuration
authors:
- "@cybertron"
- "@hardys"
- "@zaneb"
reviewers:
- "@kirankt"
- "@dtantsur"
- "@zaneb"
approvers:
- "@trozet"
cybertron marked this conversation as resolved.
Show resolved Hide resolved
- "@staebler"
creation-date: 2021-05-21
last-updated: 2021-10-27
status: implementable

see-also:
- "/enhancements/host-network-configuration.md"
- "/enhancements/machine-config/mco-network-configuration.md"
- "/enhancements/machine-config/rhcos/static-networking-enhancements.md"
---

# Baremetal IPI Network Configuration

Describe user-facing API for day-1 network customizations in the IPI workflow,
with particular focus on baremetal where such configuration is a common
requirement.

## Release Signoff Checklist

- [*] Enhancement is `implementable`
- [*] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Currently in the IPI flow, there is no way to provide day-1 network configuration
which is a common requirement, particularly for baremetal users. We can build
on the [UPI static networking enhancements](https://github.com/openshift/enhancements/blob/master/enhancements/rhcos/static-networking-enhancements.md)
to enable such configuration in the IPI flow.

## Motivation

Since the introduction of baremetal IPI, a very common user request is how
to configure day-1 networking, and in particular the following cases which are not currently possible:

* Deploy with OpenShift Machine network on a tagged (non-default) VLAN
* Deploy with OpenShift Machine network using static IPs (no DHCP)

hardys marked this conversation as resolved.
Show resolved Hide resolved
In both cases, this configuration cannot be achieved via DHCP so some
means of providing the configuration to the OS is required.

In the UPI flow this is achieved by consuming user-provided NetworkManager
keyfiles, as an input to `coreos-install --copy-network`, but there is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reference, assisted installer also uses this mechanism

no corresponding user interface at the openshift-install level.

Additionally, there are other networking configurations that would be useful
to configure via the same mechanism, even though it may be possible to
accomplish them in another way. For example:

* Deploy with OpenShift Machine network on a bond
* Deploy with OpenShift Machine network on a bridge
* Configure attributes of network interfaces such as bonding policies and MTUs

The proposed solutions should all be flexible enough to support these use
cases, but it is worth noting in case an alternative with a narrower scope
would be put forward.

### Goals

* Define API for day-1 network customizations
* Enable common on-premise network configurations (bond+vlan, static ips) via IPI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should mention if it's a goal to provide different bond/vlan config per host (not just static IPs, which obviously have to be per-hosts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess bond configuration could differ between hosts or groups of hosts (different nic names etc) - I think the API described here is basically always config per-host, but we can then perhaps optimize by detecting common config (to avoid duplicate images).

I also wonder if we have per-host config in the install-config.yaml if we can make yaml anchors/aliases work so that e.g in the case where you have a single bond/vlan config for every host you don't have to copy/paste it for each host.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this kind of relates to the de-duplication conversation below. I'll add something about that as a stretch goal.


Initially these configurations will be one per host. If there is time, an
additional goal would be to provide a mechanism to apply a single config to all
nodes of a particular type. For example, one config for all masters and another
config for all workers.

### Non-Goals

* Platforms other than `baremetal`, although the aim is a solution which could be applied to other platforms in future if needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 'baremetal'? is that platform=baremetal? Or is a 'baremetal' install on AWS (using a baremetal method; platform=none) acceptable?

Ie: https://access.redhat.com/articles/4207611

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doc appears to be describing UPI. This is baremetal IPI (i.e. platform=baremetal).

* Enabling kubernetes-nmstate by default for day-2 networking is discussed via
[another proposal](https://github.com/openshift/enhancements/pull/747)
* Provide a consistent (ideally common) user API for deployment and post-deployment configuration. Getting agreement on a common API for day-1 and day-2 has stalled due to lack of consensus around enabling kubernetes-nmstate APIs (which are the only API for day-2 currently) by default
* Configuration of the provisioning network. Users who don't want DHCP in their
deployment can use virtual media, and users who want explicit control over the
addresses used for provisioning can make the provisioning network unmanaged and
deploy their own DHCP infrastructure.

## Proposal

### User Stories

#### Story 1

As a baremetal IPI user, I want to deploy via PXE and achieve a highly
available Machine Network configuration in the most cost/space effective
way possible.

hardys marked this conversation as resolved.
Show resolved Hide resolved
This means using two top-of-rack switches, and 2 NICS per host, with the
default VLAN being used for provisioning traffic, then a bond+VLAN configuration
is required for the controlplane network.

Currently this [is not possible](https://bugzilla.redhat.com/show_bug.cgi?id=1824331)
via the IPI flow, and existing ignition/MachineConfig APIs are not sufficient
due to the chicken/egg problem with accessing the MCS.

#### Story 2

As an on-premise IPI user, I wish to use static IPs for my controlplane network,
for reasons of network ownership or concerns over reliability I can't use DHCP
and therefore need to provide a static configuration for my primary network.

There is no way to provide [MachineSpecific Configuration in OpenShift](https://github.com/openshift/machine-config-operator/issues/1720) so I am
forced to use the UPI flow which is less automated and more prone to errors.

### API Extensions

This does not modify the API of the cluster.

### Risks and Mitigations

cybertron marked this conversation as resolved.
Show resolved Hide resolved
In some existing workflows, kubernetes-nmstate is used to do network configuration on day-2. Using a different interface for day-1 introduces the potential for mismatches and configuration errors when making day-2 changes.
However, this is mitigated by the fact that the exact same configuration data can be used for both interfaces. The nmstate configuration provided to the installer can be copied directly into a NodeNetworkConfigurationPolicy for kubernetes-nmstate.
While there's still the potential for user error, the process is much simpler and less error-prone than if completely different formats were used.

## Design Details

In the IPI flow day-1 network configuration is required in 2 different cases:

* Deployment of the controlplane hosts via terraform, using input provided to openshift-install
* Deployment of compute/worker hosts (during initial deployment and scale-out), via Machine API providers for each platform

In the sections below we will describe the user-facing API that contains network configuration, and the proposed integration for each of these cases.

### User-facing API

RHCOS already provides a mechanism to specify NetworkManager keyfiles during deployment of a new node. We need to expose that functionality during the IPI install process, but preferably using [nmstate](https://nmstate.io) files as the interface for a more user-friendly experience. There are a couple of options on how to do that:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really recommend using nmstate files. It's simpler, and easy-enough to integrate. As a bonus point, it will make the experience between UPI, IPI, and assisted more consistent. Furthermore, this would make the dream of stronger integration between assisted and IPI more realistic.


* A new section in install-config.
* A secret that contains base64-encoded content for the keyfiles.
hardys marked this conversation as resolved.
Show resolved Hide resolved

These are not mutually exclusive. If we implement the install-config option, we will still need to persist the configuration in a secret so it can be used for day-2.

The data provided by the user will need to have the following structure:
```yaml
<hostname>: <nmstate configuration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you mentioned at first it will be per host, but I think that will be really annoying for some users. Assume in my scenario all my hosts are homogeneous and I have 120 of them. That's a giant config. Could we just support applying the same config to all nodes at least initially as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. I'll have to defer to the hardware folks who are doing the implementation, but I'd be in favor of something like that. Maybe we could even provide a default configuration that would automatically be applied to all nodes, unless a given node had a specific configuration provided?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the implementation we won't care either way. It's just a question of whether we can give it an interface that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for default config / templated configs and some sort of node selector mechanism

<hostname 2>: <nmstate configuration>
etc...
```

For example:
```yaml
openshift-master-0:
interfaces:
- name: eth0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this works if you know the NIC name. What about doing something like TripleO did with introspection... identifying NICs dynamically and mapping them to networks. I get this might not be applicable to the scope of this enhancement, just thinking longer term how that would work with this API, or if that would need a new API?

An example of this is I might have a scenario where I want my kapi traffic to go over the ctlplane network, but I want to define another network for all my default egress OVN traffic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would need a higher level API. We're using raw nmstate configuration data here, and it's not going to know anything about OCP networks. A higher level api might be something to consider for the cross-platform implementation of this?

The one drawback is that we were trying to stay as close as possible to the kubernetes-nmstate CRD so if someday we have kubernetes-nmstate available on day 1 we could use that CRD and avoid the custom config. A higher level interface could just intelligently populate the CRD though, so those might not be mutually exclusive goals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config has to be applied before introspection, because introspection happens... over the network.

In the long term I think we should just think of this as a way to configure enough of the network to be able to communicate with ironic. In the short term people will inevitably use it to put their whole network config in because we're not providing them with another way of setting up the network config for the running cluster nodes at install time, but we shouldn't focus too much on that.

This is a good point that the end solution for cluster networking configuration may not look like just sticking this same nmstate file into k8s-nmstate, but may involve some other new operator that takes in higher-level data about mapping NICs to networks, inspection data from hosts, and the initial nmstate, then combines them to produce the network config for the cluster node that it then passes to k8s-nmstate.

Perhaps this also answers your other question about whether this field should be baremetal-specific: yes, because no other platforms have this pre-boot configuration problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to mention, I know that the nmstate folks (@qinqon in particular) are working on a dynamic way to generate nmstate configurations depending on the current network layout (i.e. defining the bridge on top of the default gateway, instead of sticking the interface name).

Not sure of the state, because we discussed it some time ago, but I think that would solve this issue.

Copy link
Contributor

@qinqon qinqon Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, yes we are working on a tool named nmpolicy that will solve something at least similar to that.

We have a design document with examples, here it's the example you are interested on:

capture:
  default-gw: routes.running.destination=="0.0.0.0/0"  
  base-iface-routes: routes.running.next-hop-interface==capture.default-gw.routes.running[0].next-hop-interface
  base-iface:  interfaces.name==capture.default-gw.routes.running[0].next-hop-interface
  bridge-routes: capture.base-iface-routes | routes.running.next-hop-interface:="br1"
  delete-base-iface-routes: capture.base-iface-route | routes.running.state:="absent"
  bridge-routes-takeover: capture.delete-base-iface-routes.routes.running + capture.bridge-routes.routes.running
desiredState:
  interfaces:
    - name: br1
      description: Linux bridge with base interface as a port
      type: linux-bridge
      state: up
      ipv4: {{ capture.base-iface.interfaces[0].ipv4 }}
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: {{ capture.base-iface.interfaces[0].name }}
  routes:
    config: {{ capture.bridge-routes-takeover.running }}
 }}

With DHCP activated is simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will help; nmstatectl cannot run on the host, it must run on the cluster and produce keyfiles that are incorporated into the ISO before it even boots on the host. We don't currently have a way of building e.g. a container image into the ISO, so everything but the Ignition (which contains the keyfiles) must come over the network and therefore cannot be required to set up the network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nmstate team is reimplementing nmstatectl so it's just a binary without python dependencies, will that help with that ? They are also going to implement a flag to configure directly on the kernel with netlink bypassing NetworkManager.

Copy link
Member

@zaneb zaneb Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it ships installed by default in the CoreOS live ISO, no.

Copy link

@romfreiman romfreiman Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this also answers your other question about whether this field should be baremetal-specific: yes, because no other platforms have this pre-boot configuration problem.

@zaneb also UPI can gain from having static ips, right?
https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-iso_installing-bare-metal

This is basically what is already happening in the assisted installer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romfreiman UPI can already configure static ips, just not via nmstate (or automated image building), if we want UPI to support nmstate syntax in future that should be covered via another proposal, IMO it's outside of the scope of this enhancement.

type: ethernet
etc...
openshift-master-1:
interfaces:
- name: eth0
etc...
```
In install-config this would look like:
```yaml
platform:
baremetal:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put it under baremetal if there might be plans in the future to use it on other platforms? I'm thinking here about replacing ovs-configuration with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with doing anything cross-platform for this release is that we need image customization functionality that is currently only provided by baremetal IPI. As I understand it there is a similar feature in development that will work on all platforms, but it wasn't going to be available soon enough to satisfy our baremetal use cases. I suppose we could put the configuration somewhere more generic, but since it won't work on anything but baremetal right now that could be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per our conversation it seems like we could use this for baremetal to replace configure-ovs, and configure multiple nics to be on OVS bridges. configure-ovs can still exist (we need it anyway for hte other platforms), and today just skips doing anything if br-ex is already up and configured. However, this will allow us to do some more complex types of network deployments on baremetal, so seems like a solid first step to me.

hosts:
- name: openshift-master-0
networkConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will support all types of nmstate configuration right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? Under the covers we'll be using the nmstatectl cli to convert this to nmconnection files that will be baked into the image. I'm not aware of any nmstate configurations that couldn't be handled that way, but that doesn't mean there aren't any.

interfaces:
- name: eth0
type: ethernet
etc...
```

Because the initial implementation will be baremetal-specific, we can put the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baremetal-specific or host specific?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific to the baremetal platform

network configuration data into the baremetal host field, which will allow easy
mapping to the machine in question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current plan is to initially enable this interface only for the baremetal platform, so there may be concerns about adding a new top-level install-config section.

In that case, we could instead add the config to the platform-specific hosts list (which already contains other host-specific data needed to automate deployment), and this also solves the mapping to a host (since we're already consuming this per-host data in the baremetal flow, e.g BMC credentials)

Copy link

@kirankt kirankt Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Steve. This should belong in the hosts section of the baremetal platform. e.g.

platform:
  baremetal:
    hosts:
      - name: openshift-master-0
        networkData:
          eth0:
            [connection]
            id=eth0
            uuid=18e0cec7-041c-4fb3-957c-a60c80dd9b85
            type=ethernet
etc...     

Please note networkData is a placeholder name I'm using as an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this sounds reasonable. I'd be more concerned about using a baremetal-specific structure if we had the common nmstate API, but since this implementation isn't going to be useful to other groups like OVNK anyway I'm good with it.

Copy link
Contributor

@hardys hardys Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that raises the question (already mentioned in the alternatives) of whether we want this to be nmstate format or just inlined keyfiles? I guess we need the interface to enable future nmstate support even if we start with keyfiles.

Also for those familiar with OpenStack networkData implies a different format - maybe we should call this something else e.g networkConfig or networkFiles (then adding e.g nmstateConfig later would be simple)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use nmstate yaml for install-config then we need nmstate available on the provisioning host so we can convert it to keyfiles. I'm fine with doing that (it's how we'd have to process NNCP records too), but it does add a dependency. I'm not sure how controversial that will be. It's also a bit more complex implementation-wise than straight keyfiles.

One option would be to replace the interface name with the filename, i.e. eth0 -> eth0.nmconnection. That's going to have to happen at some point anyway, and then we could potentially add support for eth0.yaml, which would indicate nmstate data. But maybe that's a little too magic?

+1 to not overloading networkData.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to say it more simply another way: today openshift-install generates stub/pointer configs - I'm saying the user can provide a config which is included by that one, not that it replaces the generated pointer config.

@cgwalters this doesn't work for controlplane network configuration as previously discussed via coreos/ignition#979 - there is a chicken/egg problem e.g networking to be up before it can retrieve the merged config from the MCS.

So instead we're adopting the same process as UPI here, generate an ignition config for the coreos installer live-iso which then contains the network configs for the live-iso and coreos-installer --copy-network - the issue is there's no corresponding openshift-install interface that allows the user to provide network configs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgwalters this doesn't work for controlplane network configuration as previously discussed via coreos/ignition#979 - there is a chicken/egg problem e.g networking to be up before it can retrieve the merged config from the MCS.

I'm talking about injecting into the pointer configuration, not the MCS config. But I think that's not relevant because:

generate an ignition config for the coreos installer live-iso which then contains the network configs for the live-iso and coreos-installer --copy-network

Right, OK! So I think a similar case applies though - this "inject arbitrary network configuration into live ISO" really generalizes into "inject arbitrary Ignition into live ISO". I know some people in the past have disagreed with exposing this "arbitrary Ignition config" functionality, but in practice we already expose completely arbitrary things via MachineConfig, so having this generic functionality to me does not cost much in terms of support. I think some prior objections were rooted in us not having an opinionated high level sugar for Ignition, which is partially being addressed by shipping butane.

Now, we don't need to gate this enhancement on having an official way to inject extra ignition into the installer's generated configs. I'm more arguing that that's the way to think of it - this is just "syntactic sugar" for an ignition config which writes the provided data to /etc/NetworkManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about injecting into the pointer configuration

As I said, this doesn't work, because Ignition tries to resolve the merge directive before applying any other config in the pointer ignition, so you can't use pointer customizations to influence controlplane network config

UPI works around this limitation by allowing users to embed network configuration into the live-iso, then apply that to the deployed OS via coreos-installer --copy-network (so ignition is not involved other than as a means to drop the files into the live-iso) - we're doing the same for baremetal IPI.

this "inject arbitrary network configuration into live ISO" really generalizes into "inject arbitrary Ignition into live ISO"

This is true, but we're trying to reduce the friction around a really common for baremetal case here, not expose a way to arbitrarily customize the live-iso process.

Use of the live-iso here is arguably an implementation detail, particularly given that no other IPI platforms currently use it, so exposing a generic cross-platform installer interface would not be possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so ignition is not involved other than as a means to drop the files into the live-iso)

I think it's misleading to say it that way. Ignition is the mechanism to configure the Live ISO - including networking for the Live ISO. The whole design is intended to enable exactly this.

The way I'd say it instead is that configuring the Live ISO via Ignition allows handling effectively arbitrary network/install requirements.

not expose a way to arbitrarily customize the live-iso process.

We already do expose this to be clear; coreos-installer iso ignition embed is stable public API. I'm just arguing to make it more ergonomic to use for baremetal IPI.

Use of the live-iso here is arguably an implementation detail, particularly given that no other IPI platforms currently use it, so exposing a generic cross-platform installer interface would not be possible?

Well, I think it would boil down to having a "configure the pointer config" hook, and a "configure the live ISO config", where the latter would only apply on baremetal IPI.

Anyways...hmm, I guess my bottom line is I am not opposed to the proposal of a networkData type stanza in the install config only for baremetal. I'm just saying we should think of that as sugar for a generalized liveIgnitionConfig that we may want in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just saying we should think of that as sugar for a generalized liveIgnitionConfig that we may want in the future.

Ack, thanks @cgwalters - I think we're basically in agreement - I don't think anything here precludes potentially adding a more generalized live-iso-ignition "escape hatch" in future.

### Processing user configuration

#### Deployment of the controlplane hosts via terraform

We will map the keyfiles to their appropriate BareMetalHost using the host field
of the baremetal install-config. The keyfiles will then be added to custom
images for each host built by Terraform and Ironic.

Since different configuration may be needed for each host (for example, when
deploying with static IPs), a Secret per host will be created. A possible
future optimization is to use a single secret for scenarios such as VLANs
where multiple hosts can consume the same configuration, but the initial
implementation will have a 1:1 Secret:BareMetalHost mapping.

#### Deployment of compute/worker hosts

BareMetalHost resources for workers will be created with the Secret containing the network data referenced in the `preprovisioningNetworkData` field defined in the Metal³ [image builder integration design](https://github.com/metal3-io/metal3-docs/blob/master/design/baremetal-operator/image-builder-integration.md#custom-agent-image-controller).
This will cause the baremetal-operator to create a PreprovisioningImage CRD and wait for it to become available before booting the IPA image.

An OpenShift-specific PreprovisioningImage controller will use the provided network data to build a CoreOS IPA image with the correct ignition configuration in place. This will be accomplished by [converting the nmstate data](https://nmstate.io/features/gen_conf.html)
from the Secret into NetworkManager keyfiles using `nmstatectl gc`. The baremetal-operator will then use this customised image to boot the Host into IPA. The network configuration will be retained when CoreOS is installed to disk during provisioning.

If not overridden, the contents of the same network data secret will be passed to the Ironic custom deploy step for CoreOS that installs the node, though this will be ignored at least initially.

### Test Plan

Support will be added to [dev-scripts](https://github.com/openshift-metal3/dev-scripts) for deploying the baremetal network without DHCP enabled. A CI job will populate install-config with the appropriate network configuration and verify that deployment works properly.

### Graduation Criteria

We expect to support this immediately on the baremetal IPI platform.

#### Dev Preview -> Tech Preview

N/A

#### Tech Preview -> GA

N/A

#### Removing a deprecated feature

N/A

### Upgrade / Downgrade Strategy

There should be little impact on upgrades and downgrades. Nodes are deployed with network configuration baked into the image, which means it will remain over upgrades or downgrades. NetworkManager keyfiles are considered a stable interface so any version of NetworkManager should be able to parse them equally. The same is true of nmstate files.

Any additions or deprecations in the keyfile interface would need to be handled per the NetworkManager policy.

### Version Skew Strategy

As this feature targets day-1 configuration there should be no version skew. Day-2 operation will be handled by other components which are outside the scope of this document.

### Operational Aspects of API Extensions

NA

#### Failure Modes

Improper network configuration may cause deployment failures for some or all nodes in the cluster, depending on the nature of the misconfiguration.

#### Support Procedures

Because a networking failure is likely to make a node inaccessible, it may be necessary to access the failed node via its BMC (iDRAC, iLO, etc.) to determine why the network config failed.

## Implementation History

4.9: Initial implementation

## Drawbacks

Adds a dependency on NMState. However, NMState provides a strong backward compatibility
promise (much like NetworkManager itself), so this should be a stable interface.

## Alternatives

### Use Kubernetes-NMState NodeNetworkConfigurationPolicy custom resources

If we were able to install the [NNCP CRD](https://nmstate.io/kubernetes-nmstate/user-guide/102-configuration)
at day-1 then we could use that as the configuration interface. This has the advantage of matching the configuration syntax and objects used for day-2 network configuration via the operator.

This is currently blocked on a resolution to [Enable Kubernetes NMstate by default for selected platforms](https://github.com/openshift/enhancements/pull/747). Without NMState content available at day-1 we do not have any way to process the NMState configuration to a format usable in initial deployment.
While we hope to eventually come up with a mechanism to make NMState available on day-1, we needed another option that did not make use of NMState in order to deliver the feature on time.

In any event, we inevitably need to store the network config data for each node in a (separate) Secret to satisfy the PreprovisioningImage interface in Metal³, so the existence of the same data in a NNCP CRD is irrelevant. In future, once this is available we could provide a controller to keep them in sync.

#### Create a net-new NMState Wrapper CR

The [assisted-service](https://github.com/openshift/assisted-service/blob/0b0e3677ae83799151d11f1267cbfa39bb0c6f2e/docs/hive-integration/crds/nmstate.yaml) has created a new NMState wrapper CR.

We probably want to avoid a proliferation of different CR wrappers for nmstate
data, but one option would be to convert that (or something similar) into a common
OpenShift API, could such an API be a superset of NNCP e.g also used for day-2?
Comment on lines +277 to +279

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in an earlier comment. There are some advantages to this wrapper, whether it uses it's own CR or just reads a secret and runs validations/actions on the data.

I'd love for assisted and IPI to have a common interface here, it would help with the integration of both projects and it will make it easier for users as well.


This would mean we could at least use a common configuration format based on nmstate with minimal changes (or none if we make it _the_ way OpenShift users interact with nmstate), but unless the new API replaces NNCP there is still the risk of configuration drift between the day-1 and day-2 APIs. And we still need a Secret to generate the PreprovisioningImage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but unless the new API replaces NNCP there is still the risk of configuration drift between the day-1 and day-2

Agreed, maybe this is a good opportunity to bump this conversation? Try to find some alignment here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flaper87 This PR has been up for nearly 4 months, and designing a new API for network config is explicitly stated as a non-goal, so I'd suggest we defer that discussion ;)

Context is - we wanted to use NNCP, but because kubernetes-nmstate (and thus that CRD) isn't deployed via the release payload, we can't consume that API at install-time - discussion on that stalled ref #747

The same problem exists for the new AI nmstate CRD, it's only available via AI or when using the CIM layered-product stack.

IMO we definitely need to provide a core-OCP API for network configuration, but given the historical lack of progress on that discussion, we decided instead to focus on the existing lower level interfaces instead (e.g Secrets), any future CRD API can obviously be layered on top if/when we manage to achieve alignment :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not try to boil the ocean and solve all the networking problems at once, particularly because, as noted in this section, "...we still need a Secret to generate the PreprovisioningImage." Any higher level general purpose networking API is going to have to build on this functionality anyway, not replace it. This is step one in a longer journey that has been delayed for years already.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been up for nearly 4 months, and designing a new API for network config is explicitly stated as a non-goal, so I'd suggest we defer that discussion ;)

FWIW, I was not proposing to design a new config. That said, I didn't realize this had been up for so long, 😅


### Pass NetworkManager keyfiles directly

NetworkManager keyfiles are already used (directly with CoreOS) for UPI and when doing network configuration in the Machine Config Operator (MCO). However, they are harder to read, harder to produce, and they don't store well in JSON.

In addition, we hope to eventually base Day-2 networking configuration on nmstate NodeNetworkConfigurationPolicy (as KubeVirt already does). So using the nmstate format provides a path to a more consistent interface in the future than do keyfiles.

If we were eventually decide never to use NNCP and instead try to configure Day-2 networking through the Machine Config Operator, then it might perhaps be better to use keyfiles as that is what MCO uses.