-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baremetal IPI Network Configuration for Day-1 #817
Baremetal IPI Network Configuration for Day-1 #817
Conversation
#### API Open Questions | ||
|
||
* How do we map an individual machine's keyfiles to the machine? Hostname was previously suggested because it was already an available label in NNCP, but if we're not using NNCP then there might be a better way. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current plan is to initially enable this interface only for the baremetal
platform, so there may be concerns about adding a new top-level install-config section.
In that case, we could instead add the config to the platform-specific hosts
list (which already contains other host-specific data needed to automate deployment), and this also solves the mapping to a host (since we're already consuming this per-host data in the baremetal
flow, e.g BMC credentials)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Steve. This should belong in the hosts section of the baremetal platform. e.g.
platform:
baremetal:
hosts:
- name: openshift-master-0
networkData:
eth0:
[connection]
id=eth0
uuid=18e0cec7-041c-4fb3-957c-a60c80dd9b85
type=ethernet
etc...
Please note networkData
is a placeholder name I'm using as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this sounds reasonable. I'd be more concerned about using a baremetal-specific structure if we had the common nmstate API, but since this implementation isn't going to be useful to other groups like OVNK anyway I'm good with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that raises the question (already mentioned in the alternatives) of whether we want this to be nmstate format or just inlined keyfiles? I guess we need the interface to enable future nmstate support even if we start with keyfiles.
Also for those familiar with OpenStack networkData
implies a different format - maybe we should call this something else e.g networkConfig
or networkFiles
(then adding e.g nmstateConfig
later would be simple)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use nmstate yaml for install-config then we need nmstate available on the provisioning host so we can convert it to keyfiles. I'm fine with doing that (it's how we'd have to process NNCP records too), but it does add a dependency. I'm not sure how controversial that will be. It's also a bit more complex implementation-wise than straight keyfiles.
One option would be to replace the interface name with the filename, i.e. eth0 -> eth0.nmconnection. That's going to have to happen at some point anyway, and then we could potentially add support for eth0.yaml, which would indicate nmstate data. But maybe that's a little too magic?
+1 to not overloading networkData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to say it more simply another way: today openshift-install generates stub/pointer configs - I'm saying the user can provide a config which is included by that one, not that it replaces the generated pointer config.
@cgwalters this doesn't work for controlplane network configuration as previously discussed via coreos/ignition#979 - there is a chicken/egg problem e.g networking to be up before it can retrieve the merged config from the MCS.
So instead we're adopting the same process as UPI here, generate an ignition config for the coreos installer live-iso which then contains the network configs for the live-iso and coreos-installer --copy-network
- the issue is there's no corresponding openshift-install
interface that allows the user to provide network configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgwalters this doesn't work for controlplane network configuration as previously discussed via coreos/ignition#979 - there is a chicken/egg problem e.g networking to be up before it can retrieve the merged config from the MCS.
I'm talking about injecting into the pointer configuration, not the MCS config. But I think that's not relevant because:
generate an ignition config for the coreos installer live-iso which then contains the network configs for the live-iso and coreos-installer --copy-network
Right, OK! So I think a similar case applies though - this "inject arbitrary network configuration into live ISO" really generalizes into "inject arbitrary Ignition into live ISO". I know some people in the past have disagreed with exposing this "arbitrary Ignition config" functionality, but in practice we already expose completely arbitrary things via MachineConfig, so having this generic functionality to me does not cost much in terms of support. I think some prior objections were rooted in us not having an opinionated high level sugar for Ignition, which is partially being addressed by shipping butane.
Now, we don't need to gate this enhancement on having an official way to inject extra ignition into the installer's generated configs. I'm more arguing that that's the way to think of it - this is just "syntactic sugar" for an ignition config which writes the provided data to /etc/NetworkManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about injecting into the pointer configuration
As I said, this doesn't work, because Ignition tries to resolve the merge
directive before applying any other config in the pointer ignition, so you can't use pointer customizations to influence controlplane network config
UPI works around this limitation by allowing users to embed network configuration into the live-iso, then apply that to the deployed OS via coreos-installer --copy-network
(so ignition is not involved other than as a means to drop the files into the live-iso) - we're doing the same for baremetal IPI.
this "inject arbitrary network configuration into live ISO" really generalizes into "inject arbitrary Ignition into live ISO"
This is true, but we're trying to reduce the friction around a really common for baremetal case here, not expose a way to arbitrarily customize the live-iso process.
Use of the live-iso here is arguably an implementation detail, particularly given that no other IPI platforms currently use it, so exposing a generic cross-platform installer interface would not be possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(so ignition is not involved other than as a means to drop the files into the live-iso)
I think it's misleading to say it that way. Ignition is the mechanism to configure the Live ISO - including networking for the Live ISO. The whole design is intended to enable exactly this.
The way I'd say it instead is that configuring the Live ISO via Ignition allows handling effectively arbitrary network/install requirements.
not expose a way to arbitrarily customize the live-iso process.
We already do expose this to be clear; coreos-installer iso ignition embed
is stable public API. I'm just arguing to make it more ergonomic to use for baremetal IPI.
Use of the live-iso here is arguably an implementation detail, particularly given that no other IPI platforms currently use it, so exposing a generic cross-platform installer interface would not be possible?
Well, I think it would boil down to having a "configure the pointer config" hook, and a "configure the live ISO config", where the latter would only apply on baremetal IPI.
Anyways...hmm, I guess my bottom line is I am not opposed to the proposal of a networkData
type stanza in the install config only for baremetal. I'm just saying we should think of that as sugar for a generalized liveIgnitionConfig
that we may want in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just saying we should think of that as sugar for a generalized liveIgnitionConfig that we may want in the future.
Ack, thanks @cgwalters - I think we're basically in agreement - I don't think anything here precludes potentially adding a more generalized live-iso-ignition "escape hatch" in future.
|
||
#### Deployment of the controlplane hosts via terraform | ||
|
||
TODO - we'll have to match the keyfiles to the master Machines, then generate image/images for deployment via terraform/ironic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this by consuming the install-config data, then generating some files to be consumed on the bootstrap VM by the ironic startup script - this can then generate the images for each controlplane host prior to starting any ironic services (terraform waits for ironic-api to start before attempting to deploy)
This raises the question of if we can detect when the configuration is common for all hosts - it'll be simpler to just generate 3 images (with predictable filenames so we can pass the necessary URL to terraform/Ironic), but obviously that wastes some disk space in the bond+vlan case (where the same config will be applied to all 3 master hosts).
@kirankt I think you've been looking at this area, any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we consume one rhcos image for all control plane nodes from the rhcos-stream data. I don't think it will be necessary to send separate copies of the images for each control plane node. All we have to do is send separate network data to the ironic agent and the agent should perform the copy network action independently.
Although, I'm not sure why we're using metadata here rather than just passing networkdata directly. Perhaps @dtantsur can clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a reasonable optimization, but if we don't get it for the initial implementation it will still work so I'd consider this a stretch goal if we manage to finish this feature early. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds like we may have a few possible optimizations here - maybe we could add them to the PR as possible future work so we don't lose track?
|
||
* How do we map an individual machine's keyfiles to the machine? Hostname was previously suggested because it was already an available label in NNCP, but if we're not using NNCP then there might be a better way. | ||
|
||
* If we're using RHCOS functionality, do we still need the custom image build discussed below? Or can we just pass the keyfiles as part of the deployment process? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it for the IPA image, where there's no other way to pass it (we can't rely on the network working without the network config).
### Goals | ||
|
||
* Define API for day-1 network customizations | ||
* Enable common on-premise network configurations (bond+vlan, static ips) via IPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should mention if it's a goal to provide different bond/vlan config per host (not just static IPs, which obviously have to be per-hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess bond configuration could differ between hosts or groups of hosts (different nic names etc) - I think the API described here is basically always config per-host, but we can then perhaps optimize by detecting common config (to avoid duplicate images).
I also wonder if we have per-host config in the install-config.yaml if we can make yaml anchors/aliases work so that e.g in the case where you have a single bond/vlan config for every host you don't have to copy/paste it for each host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this kind of relates to the de-duplication conversation below. I'll add something about that as a stretch goal.
|
||
We could expose an interface to the user which accepts raw nmstate data (not a CR wrapper). This would mean a common DSL to the NodeNetworkConfigurationPolicy resource, but not directly reusing that API, so there is risk of drift between the two APIs. | ||
|
||
The user would have to pass the nmstate data via the install-config, and also as a Secret for use on scale-up, then also define a NNCP resource if day-2 management of the configuration is required (which could be risky if the configuration isn't exactly the same...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the installer set up all of these?
Or the CAPBM could create the Secret to use for day-1 automatically from a NNCP resource that is to be used for day-2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is ref #747 that the NNCP CRD provided via kubernetes-nmstate doesn't exist at the point of install - it's currently only applied if/when you install the operator day-2 from OLM.
We could look at making CAPBM look for such a resource in future though, then fall back to using secrets (which might resolve some of the concerns around day1/day2 interface overlap, if we can make it clear to users which API they should interact with)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was missing that context.
I wonder if the objections to #747 could be overcome by saying we'll enable the operator at install time whenever the user specifically provides network configuration in the installer, rather than always on particular platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaneb perhaps - the impression I got from discussion with @dhellmann and @derekwaynecarr is that we don't want knmstate in the release payload at all, because it's not relevant to all platforms (although arguably it could be e.g to support secondary interfaces for OCS) (there are also some unresolved questions around support status and ownership to deal with).
But if we're going to block core-platform functions like network configuration from the release payload long term, it probably does mean we need some install-time way to enable optional dependencies from OLM or other sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hardys I would argue that we need network configuration on all platforms. OVN requires OVS network configuration on every single platform, which today is done by the MCO ovs-configuration bash script (creating network manager config) and would be much better served by nmstate or something more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
We could expose an interface to the user which accepts raw nmstate data (not a CR wrapper). This would mean a common DSL to the NodeNetworkConfigurationPolicy resource, but not directly reusing that API, so there is risk of drift between the two APIs. | ||
|
||
The user would have to pass the nmstate data via the install-config, and also as a Secret for use on scale-up, then also define a NNCP resource if day-2 management of the configuration is required (which could be risky if the configuration isn't exactly the same...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this problem, of having 3 different copies of the same thing at
- install-config
- secret
- day-2 NNCP
exists also in the proposed solution; it's not a special downside of this alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The install-config data doesn't exist after installation, but yes I agree - we'll potentially have two copies of network configuration which isn't ideal.
Until we get a fully supported kubernetes-nmstate and an API derived from that included by default in OCP, I don't see how this can be avoided though. We had a previous draft proposal which aimed to use NNCP for both day-1 and day-2 - but unless that API is available on day-1 (which seems unlikely given #747 is stalled) that doesn't seem to be a viable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agree about this. But it is wrong to attach this to this alternative; this text should move up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this entry was largely referring to a design we've abandoned for now. I'll update it to compare/contrast with the current proposal. I think we've already discussed the drawbacks in terms of duplicate configuration in the Risks and Mitigations section.
|
||
### Provide raw nmstate data | ||
|
||
We could expose an interface to the user which accepts raw nmstate data (not a CR wrapper). This would mean a common DSL to the NodeNetworkConfigurationPolicy resource, but not directly reusing that API, so there is risk of drift between the two APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tyll & @cathay4t may correct me, but I think that NetworkManager keyfiles are similarly susceptible to drift. For example, I'm not sure that the NM team would be happy to learn that secret created for RHEL-8 NM are being used with RHEL-9 workers. NMState exists in order to provide a stable, human-readable representation of node network state; and in that it is more committed for backward and forward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very interesting. One of the objections to enabling NMState was that keyfiles provide the same functionality and the same backward compatibility. If that's not true then it eliminates one of the major arguments against NMState.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps @cgwalters may have some comments here - we did discuss this recently and he mentioned that the keyfile interface supported for UPI has proven stable, but I suspect if/when any major-rhel-version rebase happens for RHCOS we may be faced with some larger incompatibilities.
I'm not clear what our strategy would be if/when such a major-version bump happens, would we anticipate a path for in-place upgrades or is it more likely to be a reinstall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bias: I am developer of nmstate.
- The keyfile is not standardized as yaml/json. The only specification like document I found is http://www.gtkbook.com/gtkbook/keyfile.html . Using keyfile directly will need your project sharing mutual understanding of keyfile spec as NetworkManager.
- Besides creating NM connection files, nmstate has done a lot workarounds on complex network layouts like activation order, link depedency, preserving link in up state with best efforts, rollback on verification failure. Eventually, you will find yourself facing the old problems nmstate had.
- IMHO. Keyfile of NetworkManager has never been treated as API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thom311 As NetworkManager developer, any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaneb ++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoracek I don't think Python was ever the objection, but I'll keep that in mind as we continue the discussions.
@dankenigsberg I'm not sure whether adding a dependency to the installer is necessarily trivial
Just to add some context on the constraints here, the installer is written in golang and it is delivered as a single binary with no dependencies. So anything that requires a Python, or even Rust, program at install time is a non-starter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the installer is written in golang and it is delivered as a single binary with no dependencies.
Thanks @zaneb, I did not know. This constraint defies the ancient Unix principle of multiple applications interacting with each other over well-defined interfaces. It is sad that we need to re-learn it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dankenigsberg well, note that the installer isn't delivered as an RPM, nor is it restricted to platforms we control, nor even to Unix-like platforms. If potential customers wanting to try out OpenShift in their cloud account are spending their time wrestling with getting the right version of Python on their Windows laptop then we are losing a good proportion of them as customers.
That said, I didn't make the rules :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote up this change and proposed it as a pull request to this pull request (cybertron#1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll push my updated local copy of the doc later when we've had a chance to wrap up some of the existing conversations here.
### Goals | ||
|
||
* Define API for day-1 network customizations | ||
* Enable common on-premise network configurations (bond+vlan, static ips) via IPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this kind of relates to the de-duplication conversation below. I'll add something about that as a stretch goal.
#### API Open Questions | ||
|
||
* How do we map an individual machine's keyfiles to the machine? Hostname was previously suggested because it was already an available label in NNCP, but if we're not using NNCP then there might be a better way. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use nmstate yaml for install-config then we need nmstate available on the provisioning host so we can convert it to keyfiles. I'm fine with doing that (it's how we'd have to process NNCP records too), but it does add a dependency. I'm not sure how controversial that will be. It's also a bit more complex implementation-wise than straight keyfiles.
One option would be to replace the interface name with the filename, i.e. eth0 -> eth0.nmconnection. That's going to have to happen at some point anyway, and then we could potentially add support for eth0.yaml, which would indicate nmstate data. But maybe that's a little too magic?
+1 to not overloading networkData.
0095a38
to
bdf8f7e
Compare
Okay, I've pushed an updated version of this based on the comments so far. The two remaining points I'm aware of that need to be addressed are:
|
0dc3d3e
to
004d6a2
Compare
|
||
However, we have been unable to get agreement on a method to deliver the operator on day-1, which makes this problematic. | ||
It's possible we could implement some way to pull just the NNCP CRD from the operator payload and isntall it on day-1, but at this point we likely don't have time to do that and it's not clear we even want to if the operator can't be installed on day-1 too. | ||
This would essentially orphan the CRs after the initial deployment completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we're doing in the PAO is having its render mode create the CR needed to have OLM install the operator when the cluster comes up. Does that not work in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because a lot of baremetal deployments are done disconnected, we can't rely on the operator being available to OLM. In many environments the deployer will need to take additional steps to mirror the content into their environment before it can be deployed. In addition, because of the sheer amount of content that needs to be mirrored for OLM there may be deployers who don't want to do that at all. I know there are discussions happening that may improve the mirroring situation, but nothing that will happen soon enough to help us here.
Also, part of the lack of agreement on NMState for day-1 was whether we should use it at all. I hope we can eventually get past that since it's already in use for day-2, but we just ran out of time to have that discussion if we wanted to deliver this in 4.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it would be good to capture that level of detail here in this document.
I'm not sure I buy in to the "mirroring is hard" argument. We're certainly expecting them to mirror the PAO to get those features if they want it. There's also work going on to make the process for pruning the catalog easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main point is that OpenShift today lacks any API for day-1 host network configuration (other than the keyfiles passthrough provided by coreos-installer
IMO that is a core function of the platform, and it's not ideal to ask users to enable optional add-on components to achieve it, but since there isn't consensus on that (ref #747) we've decided to go with the interface that does exist in the meantime e.g keyfiles - that doesn't preclude adding some higher-level API in future, but it also means we don't have to block on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of not reiterating the entire discussion in #747 here, can we just link to that PR, explain that we did not have time to resolve it, and note that we want to revisit it in the future once we have the base keyfile functionality implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've done what I said in my previous comment. Let me know what you think.
004d6a2
to
562040b
Compare
Added the missing sections and tried to address the comments so far. |
BareMetalHost resources for workers will be created with the Secret containing the network data referenced in the `preprovisioningNetworkData` field defined in the Metal³ [image builder integration design](https://github.com/metal3-io/metal3-docs/blob/master/design/baremetal-operator/image-builder-integration.md#custom-agent-image-controller). | ||
This will cause the baremetal-operator to create a PreprovisioningImage CRD and wait for it to become available before booting the IPA image. | ||
|
||
An OpenShift-specific PreprovisioningImage controller will use the provided network data to build a CoreOS IPA image with the correct network data in place, and the baremetal-operator will then use this to boot the Host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/network data/ignition configuration/ (for clarity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, done.
562040b
to
49efb0c
Compare
|
||
### Upgrade / Downgrade Strategy | ||
|
||
There should be little impact on upgrades and downgrades. Nodes are deployed with network configuration baked into the image, which means it will remain over upgrades or downgrades. NetworkManager keyfiles are considered a stable interface so any version of NetworkManager should be able to parse them equally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we talk about the whole CoreOS IPA business here? It does have upgrade impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that needs another enhancement proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed as #879.
#### API Open Questions | ||
|
||
* How do we map an individual machine's keyfiles to the machine? Hostname was previously suggested because it was already an available label in NNCP, but if we're not using NNCP then there might be a better way. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably simplify the config a bit and also use the suggestion to include file names like so:
platform:
baremetal:
hosts:
- name: openshift-master-0
networkConfig:
- eth0.nmconnection: |
[connection]
id=eth0
uuid=18e0cec7-041c-4fb3-957c-a60c80dd9b85
type=ethernet
etc...
|
||
Currently in the IPI flow, there is no way to provide day-1 network configuration | ||
which is a common requirement, particularly for baremetal users. We can build | ||
on the [UPI static networking enhancements](rhcos/static-networking-enhancements.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT. This relative URL doesn't work correctly. Suggestion: We can use: https://github.com/openshift/enhancements/blob/master/enhancements/rhcos/static-networking-enhancements.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I missed updating this. Will fix shortly.
#### API Open Questions | ||
|
||
* How do we map an individual machine's keyfiles to the machine? Hostname was previously suggested because it was already an available label in NNCP, but if we're not using NNCP then there might be a better way. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wanted to clarify which format we're all leaning towards: NM Keyfiles or nmstate config. Either way, I hope I hope we're just consuming a blob of network config data. I was just looking at the nmstate schema and its pretty extensive. I don't think we should include it in the install-config. At least in this proposal.
https://github.com/nmstate/nmstate/blob/base/libnmstate/schema.py
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably mark the status as implementable and check both these I think, given the pretty thorough discussion so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/lgtm Not approving since I contributed to an earlier draft of this, but overall this lgtm and I think outlines an actionable plan which won't impact any platforms outside IPI baremetal (long term OpenShift should gain a platform-agnostic API for network config IMO, but that's outside the scope of this iteration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. Some comments inline. @fedepaol can you PTAL?
In install-config this would look like: | ||
```yaml | ||
platform: | ||
baremetal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put it under baremetal if there might be plans in the future to use it on other platforms? I'm thinking here about replacing ovs-configuration with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with doing anything cross-platform for this release is that we need image customization functionality that is currently only provided by baremetal IPI. As I understand it there is a similar feature in development that will work on all platforms, but it wasn't going to be available soon enough to satisfy our baremetal use cases. I suppose we could put the configuration somewhere more generic, but since it won't work on anything but baremetal right now that could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per our conversation it seems like we could use this for baremetal to replace configure-ovs, and configure multiple nics to be on OVS bridges. configure-ovs can still exist (we need it anyway for hte other platforms), and today just skips doing anything if br-ex is already up and configured. However, this will allow us to do some more complex types of network deployments on baremetal, so seems like a solid first step to me.
|
||
## Drawbacks | ||
|
||
Using NetworkManager keyfiles as the user interface for network configuration makes day-2 configuration more problematic. The only existing mechanism for modifying keyfiles after initial deployment is machine-configs, which will trigger a reboot of all affected nodes when changes are made. It also provides no validation or rollback functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the keyfiles are laid down as machine-configs, and we modify them in ovs-configuration, wont that be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. Although I suppose every time MCO rewrites them the script would run again to redo the changes? I know we have deployers doing this today and it works, but I don't know for sure that they're using OVNK.
baremetal: | ||
hosts: | ||
- name: openshift-master-0 | ||
networkConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will support all types of nmstate configuration right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? Under the covers we'll be using the nmstatectl cli to convert this to nmconnection files that will be baked into the image. I'm not aware of any nmstate configurations that couldn't be handled that way, but that doesn't mean there aren't any.
```yaml | ||
openshift-master-0: | ||
interfaces: | ||
- name: eth0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this works if you know the NIC name. What about doing something like TripleO did with introspection... identifying NICs dynamically and mapping them to networks. I get this might not be applicable to the scope of this enhancement, just thinking longer term how that would work with this API, or if that would need a new API?
An example of this is I might have a scenario where I want my kapi traffic to go over the ctlplane network, but I want to define another network for all my default egress OVN traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would need a higher level API. We're using raw nmstate configuration data here, and it's not going to know anything about OCP networks. A higher level api might be something to consider for the cross-platform implementation of this?
The one drawback is that we were trying to stay as close as possible to the kubernetes-nmstate CRD so if someday we have kubernetes-nmstate available on day 1 we could use that CRD and avoid the custom config. A higher level interface could just intelligently populate the CRD though, so those might not be mutually exclusive goals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config has to be applied before introspection, because introspection happens... over the network.
In the long term I think we should just think of this as a way to configure enough of the network to be able to communicate with ironic. In the short term people will inevitably use it to put their whole network config in because we're not providing them with another way of setting up the network config for the running cluster nodes at install time, but we shouldn't focus too much on that.
This is a good point that the end solution for cluster networking configuration may not look like just sticking this same nmstate file into k8s-nmstate, but may involve some other new operator that takes in higher-level data about mapping NICs to networks, inspection data from hosts, and the initial nmstate, then combines them to produce the network config for the cluster node that it then passes to k8s-nmstate.
Perhaps this also answers your other question about whether this field should be baremetal-specific: yes, because no other platforms have this pre-boot configuration problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention, I know that the nmstate folks (@qinqon in particular) are working on a dynamic way to generate nmstate configurations depending on the current network layout (i.e. defining the bridge on top of the default gateway, instead of sticking the interface name).
Not sure of the state, because we discussed it some time ago, but I think that would solve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, yes we are working on a tool named nmpolicy
that will solve something at least similar to that.
We have a design document with examples, here it's the example you are interested on:
capture:
default-gw: routes.running.destination=="0.0.0.0/0"
base-iface-routes: routes.running.next-hop-interface==capture.default-gw.routes.running[0].next-hop-interface
base-iface: interfaces.name==capture.default-gw.routes.running[0].next-hop-interface
bridge-routes: capture.base-iface-routes | routes.running.next-hop-interface:="br1"
delete-base-iface-routes: capture.base-iface-route | routes.running.state:="absent"
bridge-routes-takeover: capture.delete-base-iface-routes.routes.running + capture.bridge-routes.routes.running
desiredState:
interfaces:
- name: br1
description: Linux bridge with base interface as a port
type: linux-bridge
state: up
ipv4: {{ capture.base-iface.interfaces[0].ipv4 }}
bridge:
options:
stp:
enabled: false
port:
- name: {{ capture.base-iface.interfaces[0].name }}
routes:
config: {{ capture.bridge-routes-takeover.running }}
}}
With DHCP activated is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will help; nmstatectl cannot run on the host, it must run on the cluster and produce keyfiles that are incorporated into the ISO before it even boots on the host. We don't currently have a way of building e.g. a container image into the ISO, so everything but the Ignition (which contains the keyfiles) must come over the network and therefore cannot be required to set up the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nmstate team is reimplementing nmstatectl so it's just a binary without python dependencies, will that help with that ? They are also going to implement a flag to configure directly on the kernel with netlink bypassing NetworkManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it ships installed by default in the CoreOS live ISO, no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this also answers your other question about whether this field should be baremetal-specific: yes, because no other platforms have this pre-boot configuration problem.
@zaneb also UPI can gain from having static ips, right?
https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-iso_installing-bare-metal
This is basically what is already happening in the assisted installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romfreiman UPI can already configure static ips, just not via nmstate (or automated image building), if we want UPI to support nmstate syntax in future that should be covered via another proposal, IMO it's outside of the scope of this enhancement.
|
||
The data provided by the user will need to have the following structure: | ||
```yaml | ||
<hostname>: <nmstate configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you mentioned at first it will be per host, but I think that will be really annoying for some users. Assume in my scenario all my hosts are homogeneous and I have 120 of them. That's a giant config. Could we just support applying the same config to all nodes at least initially as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable. I'll have to defer to the hardware folks who are doing the implementation, but I'd be in favor of something like that. Maybe we could even provide a default configuration that would automatically be applied to all nodes, unless a given node had a specific configuration provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the implementation we won't care either way. It's just a question of whether we can give it an interface that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for default config / templated configs and some sort of node selector mechanism
left couple of comments, looks good! |
Did you take into account existing implementation of the assisted installer and ztp? Are those collide? Or can work together? |
The ZTP flow currently uses a different flow from IPI (live-iso boot) so won't be impacted by these changes, it would be good in future to converge the flows and we have an epic tracking that but it's outside the scope of this iteration Also note the existing implementation has been considered and is being reused in this cluster-integrated flow. |
@hardys I mostly asking on the nmstate. What will happen if networkConfig will be set in the install config as proposed here, but ZTP flow will be used? Can it work together? Or the networkConfig will be ignored? |
@romfreiman until/unless we converge the flows the networkConfig (and new install-config interface) won't be used by the ZTP flow - the ZTP flow builds an ISO with the network configuration embedded already - that process should continue to work regardless of this new flow, but hopefully we can look at ways to converge in future. |
means of providing the configuration to the OS is required. | ||
|
||
In the UPI flow this is achieved by consuming user-provided NetworkManager | ||
keyfiles, as an input to `coreos-install --copy-network`, but there is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reference, assisted installer also uses this mechanism
|
||
### User-facing API | ||
|
||
RHCOS already provides a mechanism to specify NetworkManager keyfiles during deployment of a new node. We need to expose that functionality during the IPI install process, but preferably using [nmstate](https://nmstate.io) files as the interface for a more user-friendly experience. There are a couple of options on how to do that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really recommend using nmstate files. It's simpler, and easy-enough to integrate. As a bonus point, it will make the experience between UPI, IPI, and assisted more consistent. Furthermore, this would make the dream of stronger integration between assisted and IPI more realistic.
We probably want to avoid a proliferation of different CR wrappers for nmstate | ||
data, but one option would be to convert that (or something similar) into a common | ||
OpenShift API, could such an API be a superset of NNCP e.g also used for day-2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in an earlier comment. There are some advantages to this wrapper, whether it uses it's own CR or just reads a secret and runs validations/actions on the data.
I'd love for assisted and IPI to have a common interface here, it would help with the integration of both projects and it will make it easier for users as well.
data, but one option would be to convert that (or something similar) into a common | ||
OpenShift API, could such an API be a superset of NNCP e.g also used for day-2? | ||
|
||
This would mean we could at least use a common configuration format based on nmstate with minimal changes (or none if we make it _the_ way OpenShift users interact with nmstate), but unless the new API replaces NNCP there is still the risk of configuration drift between the day-1 and day-2 APIs. And we still need a Secret to generate the PreprovisioningImage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but unless the new API replaces NNCP there is still the risk of configuration drift between the day-1 and day-2
Agreed, maybe this is a good opportunity to bump this conversation? Try to find some alignment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flaper87 This PR has been up for nearly 4 months, and designing a new API for network config is explicitly stated as a non-goal, so I'd suggest we defer that discussion ;)
Context is - we wanted to use NNCP, but because kubernetes-nmstate (and thus that CRD) isn't deployed via the release payload, we can't consume that API at install-time - discussion on that stalled ref #747
The same problem exists for the new AI nmstate CRD, it's only available via AI or when using the CIM layered-product stack.
IMO we definitely need to provide a core-OCP API for network configuration, but given the historical lack of progress on that discussion, we decided instead to focus on the existing lower level interfaces instead (e.g Secrets), any future CRD API can obviously be layered on top if/when we manage to achieve alignment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not try to boil the ocean and solve all the networking problems at once, particularly because, as noted in this section, "...we still need a Secret to generate the PreprovisioningImage." Any higher level general purpose networking API is going to have to build on this functionality anyway, not replace it. This is step one in a longer journey that has been delayed for years already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been up for nearly 4 months, and designing a new API for network config is explicitly stated as a non-goal, so I'd suggest we defer that discussion ;)
FWIW, I was not proposing to design a new config. That said, I didn't realize this had been up for so long, 😅
#### API Open Questions | ||
|
||
* How do we map an individual machine's keyfiles to the machine? Hostname was previously suggested because it was already an available label in NNCP, but if we're not using NNCP then there might be a better way. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be careful here about what the long-term plans are. Once the fields are put in the baremetal platform section, they must stay there, even if we add implementations for other platforms. So if we truly plan to offer this feature for other platforms, we should give serious consideration to whether or not this is the appropriate place for these fields. Can we see an alternative configuration that is not platform dependent?
Also, I am concerned about "consuming a blob of network config data" in the install-config.yaml. The install config should provide concise configuration options, that are well-defined and well-tested. It should not be a way to support any and all configurations. Can we support specific fields and give the user the ability to modify generated manifests with more customizations as needed?
* A new section in install-config. | ||
* A secret that contains base64-encoded content for the keyfiles. | ||
|
||
These are not mutually exclusive. If we implement the install-config option, we will still need to persist the configuration in a secret so it can be used for day-2. In fact, the initial implementation should likely be based on secrets, and then install-config integration can be added on top of that to improve the user experience. This way the install-config work won't block implementation of the feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep the install-config integration as a separate enhancement. I have serious concerns over the appropriateness of embedding nmstate files into the install config. I would like to start with secrets in manifests that the user can modify. I do not want to derail the viability of this the core of this enhancement with deliberation over whether to include the configuration in the install config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing the data via manifests is probably a viable alternative, but can you expand on the concerns just so we all have relevant context if/when any subsequent proposal gets created?
/cc @cybertron @zaneb @sadasu what are your thoughts here, is it reasonable to fall back to manifests containing the underlying secret data given this latest feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with only exposing the Secret API to users, is they will also need to modify all of the installer-generated BareMetalHost manifests (so they reference the desired secret(s)).
That's likely to be error prone (particularly for environments with a lot of worker hosts and per-host configuration e.g static IPs), obviously it's scriptable but this is why we wanted to enable input of this data via the install-config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep the install-config integration as a separate enhancement.
For all intents and purposes, this is a separate enhancement for the install-config integration.
I have serious concerns over the appropriateness of embedding nmstate files into the install config. I would like to start with secrets in manifests that the user can modify.
It's just the same data with the added downside that you'd have to specify the Secret wrapper (TBH users should never need to know that it's stored in a Secret) and get the name right so that the installer can guess which BaremetalHost to link it to. I assume you're suggesting this as a temporary workaround until we can agree on the right interface for the installer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that the installer can guess which BaremetalHost to link it to
I'm not sure the installer can do this - after create manifests
the installer has already created the BMH manifests, so it will be up to the user to create the Secrets, then modify every single BMH manifest generated by the installer, then run create cluster
Then at runtime, we'll have to read those BMH/Secret manifests on the bootstrap VM (hackily e.g reading manifests in startironic.sh so we can build custom RHCOS images for the controlplane hosts with the right data).
This seems pretty far from ideal in terms of user experience and implementation IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Steve.
I agree that might work (and perhaps this was @staebler's intention) as a temporary measure to enable us to test the rest of the flow while we agree on what the interface to the installer should be (although it will require a bunch of extra rework), but if editing the manifests is the only interface to it then we haven't really shipped anything usable to customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this enhancement, we keep the install-config option and defer the (alternative option of) implementing the manifest creation until a future release? We will need to flesh out the details of reading/interacting with the user-created network config manifests. For 4.10 we can let users specify the networkConfig as part of the install-config only.
0e61cd1
to
7eaa946
Compare
Okay, I've updated this with the latest plan based on the discussions this week. Let me know if you find anything else that needs to be changed. Thanks! |
Describe user-facing API for day-1 network customizations in the IPI workflow, with particular focus on baremetal where such configuration is a common requirement. Co-authored-by: Steve Hardy <shardy@redhat.com> Co-authored-by: Zane Bitter <zbitter@redhat.com> Co-authored-by: Dan Kenigsberg <danken@gmail.com>
Use nmstate instead of NetworkManager keyfiles as the user interface for configuring pre-deploy baremetal host networking.
I approve of the general approach here. @kirankt will give a final approval on behalf of the installer team once all of the details that have already been discussed out-of-band are captured in this PR. /approve |
7eaa946
to
f33988f
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: staebler, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Describe user-facing API for day-1 network customizations in the IPI workflow,
with particular focus on baremetal where such configuration is a common
requirement.
Co-authored-by: Steve Hardy shardy@redhat.com