Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baremetal IPI Network Configuration for Day-1 #817
Baremetal IPI Network Configuration for Day-1 #817
Changes from all commits
7b98b77
60e6ab7
ab6244c
f33988f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reference, assisted installer also uses this mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should mention if it's a goal to provide different bond/vlan config per host (not just static IPs, which obviously have to be per-hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess bond configuration could differ between hosts or groups of hosts (different nic names etc) - I think the API described here is basically always config per-host, but we can then perhaps optimize by detecting common config (to avoid duplicate images).
I also wonder if we have per-host config in the install-config.yaml if we can make yaml anchors/aliases work so that e.g in the case where you have a single bond/vlan config for every host you don't have to copy/paste it for each host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this kind of relates to the de-duplication conversation below. I'll add something about that as a stretch goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 'baremetal'? is that platform=baremetal? Or is a 'baremetal' install on AWS (using a baremetal method; platform=none) acceptable?
Ie: https://access.redhat.com/articles/4207611
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doc appears to be describing UPI. This is baremetal IPI (i.e. platform=baremetal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really recommend using nmstate files. It's simpler, and easy-enough to integrate. As a bonus point, it will make the experience between UPI, IPI, and assisted more consistent. Furthermore, this would make the dream of stronger integration between assisted and IPI more realistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you mentioned at first it will be per host, but I think that will be really annoying for some users. Assume in my scenario all my hosts are homogeneous and I have 120 of them. That's a giant config. Could we just support applying the same config to all nodes at least initially as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable. I'll have to defer to the hardware folks who are doing the implementation, but I'd be in favor of something like that. Maybe we could even provide a default configuration that would automatically be applied to all nodes, unless a given node had a specific configuration provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the implementation we won't care either way. It's just a question of whether we can give it an interface that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for default config / templated configs and some sort of node selector mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this works if you know the NIC name. What about doing something like TripleO did with introspection... identifying NICs dynamically and mapping them to networks. I get this might not be applicable to the scope of this enhancement, just thinking longer term how that would work with this API, or if that would need a new API?
An example of this is I might have a scenario where I want my kapi traffic to go over the ctlplane network, but I want to define another network for all my default egress OVN traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would need a higher level API. We're using raw nmstate configuration data here, and it's not going to know anything about OCP networks. A higher level api might be something to consider for the cross-platform implementation of this?
The one drawback is that we were trying to stay as close as possible to the kubernetes-nmstate CRD so if someday we have kubernetes-nmstate available on day 1 we could use that CRD and avoid the custom config. A higher level interface could just intelligently populate the CRD though, so those might not be mutually exclusive goals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config has to be applied before introspection, because introspection happens... over the network.
In the long term I think we should just think of this as a way to configure enough of the network to be able to communicate with ironic. In the short term people will inevitably use it to put their whole network config in because we're not providing them with another way of setting up the network config for the running cluster nodes at install time, but we shouldn't focus too much on that.
This is a good point that the end solution for cluster networking configuration may not look like just sticking this same nmstate file into k8s-nmstate, but may involve some other new operator that takes in higher-level data about mapping NICs to networks, inspection data from hosts, and the initial nmstate, then combines them to produce the network config for the cluster node that it then passes to k8s-nmstate.
Perhaps this also answers your other question about whether this field should be baremetal-specific: yes, because no other platforms have this pre-boot configuration problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention, I know that the nmstate folks (@qinqon in particular) are working on a dynamic way to generate nmstate configurations depending on the current network layout (i.e. defining the bridge on top of the default gateway, instead of sticking the interface name).
Not sure of the state, because we discussed it some time ago, but I think that would solve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, yes we are working on a tool named
nmpolicy
that will solve something at least similar to that.We have a design document with examples, here it's the example you are interested on:
With DHCP activated is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will help; nmstatectl cannot run on the host, it must run on the cluster and produce keyfiles that are incorporated into the ISO before it even boots on the host. We don't currently have a way of building e.g. a container image into the ISO, so everything but the Ignition (which contains the keyfiles) must come over the network and therefore cannot be required to set up the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nmstate team is reimplementing nmstatectl so it's just a binary without python dependencies, will that help with that ? They are also going to implement a flag to configure directly on the kernel with netlink bypassing NetworkManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it ships installed by default in the CoreOS live ISO, no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaneb also UPI can gain from having static ips, right?
https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-iso_installing-bare-metal
This is basically what is already happening in the assisted installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romfreiman UPI can already configure static ips, just not via nmstate (or automated image building), if we want UPI to support nmstate syntax in future that should be covered via another proposal, IMO it's outside of the scope of this enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put it under baremetal if there might be plans in the future to use it on other platforms? I'm thinking here about replacing ovs-configuration with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with doing anything cross-platform for this release is that we need image customization functionality that is currently only provided by baremetal IPI. As I understand it there is a similar feature in development that will work on all platforms, but it wasn't going to be available soon enough to satisfy our baremetal use cases. I suppose we could put the configuration somewhere more generic, but since it won't work on anything but baremetal right now that could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per our conversation it seems like we could use this for baremetal to replace configure-ovs, and configure multiple nics to be on OVS bridges. configure-ovs can still exist (we need it anyway for hte other platforms), and today just skips doing anything if br-ex is already up and configured. However, this will allow us to do some more complex types of network deployments on baremetal, so seems like a solid first step to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will support all types of nmstate configuration right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? Under the covers we'll be using the nmstatectl cli to convert this to nmconnection files that will be baked into the image. I'm not aware of any nmstate configurations that couldn't be handled that way, but that doesn't mean there aren't any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
baremetal-specific or host specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specific to the baremetal platform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current plan is to initially enable this interface only for the
baremetal
platform, so there may be concerns about adding a new top-level install-config section.In that case, we could instead add the config to the platform-specific
hosts
list (which already contains other host-specific data needed to automate deployment), and this also solves the mapping to a host (since we're already consuming this per-host data in thebaremetal
flow, e.g BMC credentials)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Steve. This should belong in the hosts section of the baremetal platform. e.g.
Please note
networkData
is a placeholder name I'm using as an example.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this sounds reasonable. I'd be more concerned about using a baremetal-specific structure if we had the common nmstate API, but since this implementation isn't going to be useful to other groups like OVNK anyway I'm good with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that raises the question (already mentioned in the alternatives) of whether we want this to be nmstate format or just inlined keyfiles? I guess we need the interface to enable future nmstate support even if we start with keyfiles.
Also for those familiar with OpenStack
networkData
implies a different format - maybe we should call this something else e.gnetworkConfig
ornetworkFiles
(then adding e.gnmstateConfig
later would be simple)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use nmstate yaml for install-config then we need nmstate available on the provisioning host so we can convert it to keyfiles. I'm fine with doing that (it's how we'd have to process NNCP records too), but it does add a dependency. I'm not sure how controversial that will be. It's also a bit more complex implementation-wise than straight keyfiles.
One option would be to replace the interface name with the filename, i.e. eth0 -> eth0.nmconnection. That's going to have to happen at some point anyway, and then we could potentially add support for eth0.yaml, which would indicate nmstate data. But maybe that's a little too magic?
+1 to not overloading networkData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgwalters this doesn't work for controlplane network configuration as previously discussed via coreos/ignition#979 - there is a chicken/egg problem e.g networking to be up before it can retrieve the merged config from the MCS.
So instead we're adopting the same process as UPI here, generate an ignition config for the coreos installer live-iso which then contains the network configs for the live-iso and
coreos-installer --copy-network
- the issue is there's no correspondingopenshift-install
interface that allows the user to provide network configsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about injecting into the pointer configuration, not the MCS config. But I think that's not relevant because:
Right, OK! So I think a similar case applies though - this "inject arbitrary network configuration into live ISO" really generalizes into "inject arbitrary Ignition into live ISO". I know some people in the past have disagreed with exposing this "arbitrary Ignition config" functionality, but in practice we already expose completely arbitrary things via MachineConfig, so having this generic functionality to me does not cost much in terms of support. I think some prior objections were rooted in us not having an opinionated high level sugar for Ignition, which is partially being addressed by shipping butane.
Now, we don't need to gate this enhancement on having an official way to inject extra ignition into the installer's generated configs. I'm more arguing that that's the way to think of it - this is just "syntactic sugar" for an ignition config which writes the provided data to
/etc/NetworkManager
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, this doesn't work, because Ignition tries to resolve the
merge
directive before applying any other config in the pointer ignition, so you can't use pointer customizations to influence controlplane network configUPI works around this limitation by allowing users to embed network configuration into the live-iso, then apply that to the deployed OS via
coreos-installer --copy-network
(so ignition is not involved other than as a means to drop the files into the live-iso) - we're doing the same for baremetal IPI.This is true, but we're trying to reduce the friction around a really common for baremetal case here, not expose a way to arbitrarily customize the live-iso process.
Use of the live-iso here is arguably an implementation detail, particularly given that no other IPI platforms currently use it, so exposing a generic cross-platform installer interface would not be possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's misleading to say it that way. Ignition is the mechanism to configure the Live ISO - including networking for the Live ISO. The whole design is intended to enable exactly this.
The way I'd say it instead is that configuring the Live ISO via Ignition allows handling effectively arbitrary network/install requirements.
We already do expose this to be clear;
coreos-installer iso ignition embed
is stable public API. I'm just arguing to make it more ergonomic to use for baremetal IPI.Well, I think it would boil down to having a "configure the pointer config" hook, and a "configure the live ISO config", where the latter would only apply on baremetal IPI.
Anyways...hmm, I guess my bottom line is I am not opposed to the proposal of a
networkData
type stanza in the install config only for baremetal. I'm just saying we should think of that as sugar for a generalizedliveIgnitionConfig
that we may want in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks @cgwalters - I think we're basically in agreement - I don't think anything here precludes potentially adding a more generalized live-iso-ignition "escape hatch" in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in an earlier comment. There are some advantages to this wrapper, whether it uses it's own CR or just reads a secret and runs validations/actions on the data.
I'd love for assisted and IPI to have a common interface here, it would help with the integration of both projects and it will make it easier for users as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, maybe this is a good opportunity to bump this conversation? Try to find some alignment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flaper87 This PR has been up for nearly 4 months, and designing a new API for network config is explicitly stated as a non-goal, so I'd suggest we defer that discussion ;)
Context is - we wanted to use NNCP, but because kubernetes-nmstate (and thus that CRD) isn't deployed via the release payload, we can't consume that API at install-time - discussion on that stalled ref #747
The same problem exists for the new AI nmstate CRD, it's only available via AI or when using the CIM layered-product stack.
IMO we definitely need to provide a core-OCP API for network configuration, but given the historical lack of progress on that discussion, we decided instead to focus on the existing lower level interfaces instead (e.g Secrets), any future CRD API can obviously be layered on top if/when we manage to achieve alignment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not try to boil the ocean and solve all the networking problems at once, particularly because, as noted in this section, "...we still need a Secret to generate the PreprovisioningImage." Any higher level general purpose networking API is going to have to build on this functionality anyway, not replace it. This is step one in a longer journey that has been delayed for years already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I was not proposing to design a new config. That said, I didn't realize this had been up for so long, 😅