Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhcos: add rhcos-inject enhancement #492

Closed
wants to merge 1 commit into from
Closed

rhcos: add rhcos-inject enhancement #492

wants to merge 1 commit into from

Conversation

crawford
Copy link
Contributor

@crawford crawford commented Oct 2, 2020

No description provided.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: crawford
To complete the pull request process, please assign sjenning after the PR has been reviewed.
You can assign the PR to them by writing /assign @sjenning in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


## Motivation

Admins need a way to inject customization into their RHCOS nodes before they are provisioned by the cluster. In most cases, this configuration is require in order for the provisioning process to complete, so the normal facilities (e.g. Machine Configuration Operator) are not yet available. Today, this is a very manual process involving an admin interactively providing that configuration at the point of running the RHCOS installer. This might work in some environments, but many others don't have easy interactive access due to technical or policy constraints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/require/required/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit but I'd say "In some cases" as IME it's a fairly small subset of early-network (and perhaps disk) configuration that we keep running into that can't easily be handled via MachineConfig/Ignition.

Also, IMO the issue isn't that users don't have easy interactive access - as the owner of some hardware I probably do have that access, the issue is that solution isn't something I can automate, so it's not workable in anything other than very small-scale PoC type situations.


## Proposal

Add a new, optional step to the installation process that allows an admin to inject network configuration and any other customization to the RHCOS installer image (and maybe other RHCOS images in the future) before invocation. This customization is performed by a new `rhcos-inject` utility that takes as input an RHCOS installer image (`rhcos-<version>-installer.<architecture>.iso`), network configuration, and a set of Ignition Configs and generates a new RHCOS installer image with all of the configuration and customization included. This new installer image can then be booted in an unsupervised manner and will complete the RHCOS installation along with any customization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be adding network-related config to the ignition and embedding the result in the ISO? Or would it change the ISO contents in a different way?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds as though this tool does exactly the following:

  • Generate an Ignition config for the live system that embeds NM configuration + the Ignition config for the installed system + a coreos-installer unit that forwards them to the installed system with --ignition-file and --copy-network + any custom Ignition fragments
  • Embed it with coreos-installer iso ignition embed

Is that the idea? If so, it'd be good to spell out why this should be a new tool rather than a subcommand of coreos-installer. If we view this as sugar over coreos-installer iso ignition embed, I don't think the functionality would be too far outside coreos-installer's existing scope, and implementing it as a subcommand would save us having to distribute an additional binary.

Also worth noting that some of the customization functionality would be useful for PXE installs, so we should also be able to produce the live Ignition config without embedding it into an ISO.

Nit: rhcos-<version>-installer.<architecture>.iso is the old installer image; I assume you mean rhcos-<version>-live.<architecture>.iso.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that the new coreos-installer had so much functionality. It sounds like a lot of the heavy-lifting has already been done and that these new features should just be folded in. That's great news!

Regarding the original question, I had assumed that the network config would happen for two contexts: the pre-pivot Ignition environment (initramfs) and the post-pivot system. This would allow for image-based deployments (e.g. QCOWs or VHDs) to fetch their Ignition Configs from the Machine Config Server running in the cluster. If we were only to configure the networking in the post-pivot system (e.g. by using Ignition to write the configs), the machine wouldn't be able to fetch its config from the MCS and would have to be provided the fully-rendered config up front. This is only easy to do if the live ISO is performing the installation though.


As with all software, it's important to consider coupling and cohesion when looking at this solution and its alternatives. Defining clear API boundaries and building upon layers of abstraction are some of the most effective techniques for avoiding pitfalls. This solution chooses to make a distinction between the networking required for an individual node to operate and for the cluster itself; respectively, the machine network and the pod network. A functioning machine network is considered a prerequisite to installation, as is power, cooling, and many others. The pod network, on the other hand, is something created and managed by OpenShift. Working backward from this assumption, it's clear that the solution to the problem of pre-installation network configuration should not be solved by the cluster.

When thinking about where this functionality should live, `openshift-install` may seem like an obvious choice. This is a poor fit, however. `openshift-install` is only used during installation and destruction of the cluster, whereas this functionality would also be necessary post-installation (e.g. during a scaling event). Additionally, there are a number of existing and future components which would benefit from this functionality, but may not want to carry the full weight of `openshift-install` (368 MiB at the time writing). Even further, `openshift-install` needs to continue to support MacOS, but it wouldn't be feasible to do the necessary Linux file system operations from that environment. It's going to be most flexible to implement this new functionality in a stand-alone, Linux-only utility. Future improvements may include running this utility in the cluster so that an admin can perform customizations from the comfort of the browser and then simply downloading the result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would rhcos-inject fit into the workflow of a day 2 scaling event?


## Proposal

Add a new, optional step to the installation process that allows an admin to inject network configuration and any other customization to the RHCOS installer image (and maybe other RHCOS images in the future) before invocation. This customization is performed by a new `rhcos-inject` utility that takes as input an RHCOS installer image (`rhcos-<version>-installer.<architecture>.iso`), network configuration, and a set of Ignition Configs and generates a new RHCOS installer image with all of the configuration and customization included. This new installer image can then be booted in an unsupervised manner and will complete the RHCOS installation along with any customization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your thinking with regards to making a new rhcos-inject tool as opposed to adding this behavior to coreos-installer?

Copy link
Contributor

@celebdor celebdor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to explicitly cover how the mechanism is used to deploy worker nodes after installation (or replace masters).


## Motivation

Admins need a way to inject customization into their RHCOS nodes before they are provisioned by the cluster. In most cases, this configuration is require in order for the provisioning process to complete, so the normal facilities (e.g. Machine Configuration Operator) are not yet available. Today, this is a very manual process involving an admin interactively providing that configuration at the point of running the RHCOS installer. This might work in some environments, but many others don't have easy interactive access due to technical or policy constraints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Admins need a way to inject customization into their RHCOS nodes before they are provisioned by the cluster. In most cases, this configuration is require in order for the provisioning process to complete, so the normal facilities (e.g. Machine Configuration Operator) are not yet available. Today, this is a very manual process involving an admin interactively providing that configuration at the point of running the RHCOS installer. This might work in some environments, but many others don't have easy interactive access due to technical or policy constraints.
Admins need a way to inject customization into their RHCOS nodes before they are provisioned by the cluster. In most cases, this configuration is required in order for the provisioning process to complete, so the normal facilities (e.g. Machine Configuration Operator) are not yet available. Today, this is a very manual process involving an admin interactively providing that configuration at the point of running the RHCOS installer. This might work in some environments, but many others don't have easy interactive access due to technical or policy constraints.


#### Flag Names

I hate the classic `ip=:::::::::::::` syntax but I didn't want to redesign that in this initial pass. We should rethink the specific representation of these options before implementation. This will also allow for a more expressive invocation that can be used for things like Dot1Q, VLAN-tagging, teaming, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very nice, since we have an enhancement doing it for day 2 networking, that it was configurable with the NodeNetworkConfigurationPolicy syntax (either from a cli parameter of from stdin):

desiredState:
    interfaces:
    - name: bond0
      type: bond
      state: up
      ipv4:
        dhcp: true
        enabled: true
      link-aggregation:
        mode: balance-rr
        options:
          miimon: '140'
        slaves:
        - eth1
        - eth2
    - name: bond0.102
      type: vlan
      state: up
      ipv4:
        dhcp: true
        enabled: true
      vlan:
        base-iface: bond0
        id: 102

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using kernel-argument syntax in the afterburn.initrd.network-kargs VMware guestinfo property, so there is some precedent. But I agree that syntax isn't great, and PM has also received feedback to that effect. Another option is to consume NetworkManager key files or ifcfg files, since those are what need to be written to disk anyway.

Copy link

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of a high-level unified tool for customizing an ISO install, and the use cases seem compelling. However, this draft feels as though it's alluding to an implementation plan that's not actually spelled out here. I think the rhcos-inject command is intended as sugar over the existing ISO embed, install hook, and --copy-network functionality, but the proposal doesn't mention any of those features by name, so the intent isn't completely clear. Could you spell out which functionality would build on existing code, which is believed to be net new, and which would replace existing mechanisms?


## Proposal

Add a new, optional step to the installation process that allows an admin to inject network configuration and any other customization to the RHCOS installer image (and maybe other RHCOS images in the future) before invocation. This customization is performed by a new `rhcos-inject` utility that takes as input an RHCOS installer image (`rhcos-<version>-installer.<architecture>.iso`), network configuration, and a set of Ignition Configs and generates a new RHCOS installer image with all of the configuration and customization included. This new installer image can then be booted in an unsupervised manner and will complete the RHCOS installation along with any customization.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds as though this tool does exactly the following:

  • Generate an Ignition config for the live system that embeds NM configuration + the Ignition config for the installed system + a coreos-installer unit that forwards them to the installed system with --ignition-file and --copy-network + any custom Ignition fragments
  • Embed it with coreos-installer iso ignition embed

Is that the idea? If so, it'd be good to spell out why this should be a new tool rather than a subcommand of coreos-installer. If we view this as sugar over coreos-installer iso ignition embed, I don't think the functionality would be too far outside coreos-installer's existing scope, and implementing it as a subcommand would save us having to distribute an additional binary.

Also worth noting that some of the customization functionality would be useful for PXE installs, so we should also be able to produce the live Ignition config without embedding it into an ISO.

Nit: rhcos-<version>-installer.<architecture>.iso is the old installer image; I assume you mean rhcos-<version>-live.<architecture>.iso.


Add a new, optional step to the installation process that allows an admin to inject network configuration and any other customization to the RHCOS installer image (and maybe other RHCOS images in the future) before invocation. This customization is performed by a new `rhcos-inject` utility that takes as input an RHCOS installer image (`rhcos-<version>-installer.<architecture>.iso`), network configuration, and a set of Ignition Configs and generates a new RHCOS installer image with all of the configuration and customization included. This new installer image can then be booted in an unsupervised manner and will complete the RHCOS installation along with any customization.

Initially, `rhcos-inject` will primarily focus on network configuration but as the needs of customers evolve, this can be expanded to include other common customization. As always, an escape hatch is needed for unanticipated cases and this will be in the form of raw Ignition Configs. If there is a need for additional customization beyond network configuration, the admin can include that in an Ignition Config and inject that alongside everything else.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw config would be applied via ignition.config.merge and a data URL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that when a user provides ignition config via openshift-install, it ends up in the pointer ignition config, which means we still need networking to download the rendered config from the MCS (which ignition does before applying any config so there's potentially a chicken/egg problem wrt network config).

That's why I'm proposing #467 - it means that in the IPI baremetal case we can automate providing the entire config via a data URL at runtime, and avoid the need for any networking when ignition runs.

Copy link
Member

@runcom runcom Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: after discussions, I understand why we want to go the MCO way

@hardys I think with this enhancement, network configuration is going to be a target of this rhcos-inject step which makes #467 not needed afaict - the need for the ignition config in this paragraph is for other scenarios but networking. The tl;dr would be that your image has the network customization needed already so anything that isn't network and can be done via MachineConfigs will just work as you're able to fetch from the MCS at that point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@runcom No, currently baremetal IPI doesn't use the installer iso, and there's nothing in the architecture that allows for automated customization of the target OS image, so #467 is still needed.

As an interim step we have https://access.redhat.com/solutions/5460671 which documents a similar iso customization to this enhancement (but the target OS image, not the installer iso), but that's not a scalable or well integrated solution IMO - instead we want to just provide the config via ignition at the point of deployment (which #467 will enable)

--persist-networking=false
```

This custom service runs in the context of the RHCOS installer and is responsible for writing the network configuration for the installed system (e.g. based on an external lookup by MAC address). This escape hatch can be used in environments which employ an in-house host configuration procedure.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably provide some sugar for generating the install hook boilerplate.


#### Flag Names

I hate the classic `ip=:::::::::::::` syntax but I didn't want to redesign that in this initial pass. We should rethink the specific representation of these options before implementation. This will also allow for a more expressive invocation that can be used for things like Dot1Q, VLAN-tagging, teaming, etc.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using kernel-argument syntax in the afterburn.initrd.network-kargs VMware guestinfo property, so there is some precedent. But I agree that syntax isn't great, and PM has also received feedback to that effect. Another option is to consume NetworkManager key files or ifcfg files, since those are what need to be written to disk anyway.


#### Service Orchestration

What systemd targets are needed to guide the ordering of services that are injected into the installation environment? I presume we'll want something to ensure services run before installation begins and after that completes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are documented sequencing directives for both cases.


#### Raw Ignition Configs

Support for injecting raw Ignition Configs is very much an escape hatch - no human should be spending any significant amount of time reading or writing Ignition configs. If users find utility in this mechanism (as I suspect they will), they are immediately going to be met with frustration when simple syntax errors prevent nodes from installing correctly. We should consider jumping directly to a more user-friendly approach of integrating a config transpiler (e.g. https://github.com/coreos/container-linux-config-transpiler) so that admins have a better experience. The obvious downside to this approach is that it will be easier to make customizations using this mechanism versus the preferred paradigm of Machine Configs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it might actually make more sense for all of the sugar described here to be implemented directly in FCCT. As described, rhcos-inject is just a transpiler + coreos-install iso ignition embed, and we do already have a transpiler. I've filed coreos/butane#137 for this.


## Drawbacks

- This approach taints the pristine RHCOS assets that come from our build system. One of the early goals of OpenShift 4 was to avoid the use of "golden images" and to push customers to make use of declarative configuration instead. This is a notable departure from that stance and opens the door to misuse and obfuscation of the installation environment.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ship has already sailed. However, coreos-installer provides show/remove subcommands that allow distinguishing modified ISOs and resetting them back to pristine state, and the modifications are themselves declarative, so I think we're not in terrible shape here.

done
```

This network configuration applies to the post-pivot RHCOS installation environment and is copied to both the pre- and post-pivot installed environment by default. In the event that exotic configuration is required, this copying of configuration can be disabled and a service can be used instead (TODO(crawford) figure out systemd service ordering):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation environment doesn't pivot.

### Non-Goals

- Inventory management
- Bring-your-own-RHEL customization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call out that currently this doesn't solve the problem for IPI baremetal deployments (since that doesn't currently use the RHCOS installer iso), that will be addressed via #467

--openshift-config=worker.ign \
--ip=10.10.10.$((i+10))::10.10.10.254:255.255.255.0:worker${i}.example.com:enp1s0:none
done
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically a workaround for the fact that MCO can't do per-machine configs ref openshift/machine-config-operator#1720 right?

This won't work for IPI baremetal as it's currently implemented since we don't have a way to customize the ignition files per host, we'd have to implement some equivalent customization method in the deploy workflow.

Also I wonder what do we do on day-2 - to enable the configuration to me managed we'd need to also provide equivalent config via MachineConfig, but there's currently no way to do that per-machine - what happens for example if some syntax change is required on upgrade to the nic configs that result from this process?

### Implementation Details/Notes/Constraints

As with all software, it's important to consider coupling and cohesion when looking at this solution and its alternatives. Defining clear API boundaries and building upon layers of abstraction are some of the most effective techniques for avoiding pitfalls. This solution chooses to make a distinction between the networking required for an individual node to operate and for the cluster itself; respectively, the machine network and the pod network. A functioning machine network is considered a prerequisite to installation, as is power, cooling, and many others. The pod network, on the other hand, is something created and managed by OpenShift. Working backward from this assumption, it's clear that the solution to the problem of pre-installation network configuration should not be solved by the cluster.

Copy link
Contributor

@hardys hardys Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network configuration isn't just an installation time problem - there may be changes over time which impact the configuration created by this workflow - how do we envisage customers dealing with that?

It seems like the answer is probably ssh/scp or ansible - and that's quite inconsistent with the user experience for everything else (including secondary nics which aren't needed during initial deployment).

## Alternatives

- https://github.com/openshift/enhancements/pull/399
- https://github.com/openshift/enhancements/pull/467
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an alternative to 467, it won't work for IPI baremetal in its current form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's an alternative to 399 either - that's about declarative network configuration via the MCO, but this about a non-declarative way to inject install-time configuration, explicitly not managed by the MCO?

@cgwalters
Copy link
Member

I share the concerns that this seems to be written as if we hadn't already written (and implemented and are just about shipping in 4.6) the enhancements from

So far the OpenShift trend I think has been for fewer binaries for admins - why not ship this as part of openshift-install? Also tangentially related to this we have OKD, so naming the tool rhcos- seems off from that PoV since FCOS is used there.

But as far as the overall idea, agree that we clearly need some higher level fcct-like sugar on top of generating Ignition.

@cgwalters
Copy link
Member

BTW one tangentially related idea I had is automatically pulling this "unmanaged Ignition" into the cluster by default. Basically the MCO reads the /run/ignition.json that the node fetched, and subtracts the MCO-managed Ignition to effectively regenerate the pointer config + per-node state.

Then we could create a machineconfig like object that includes the node name - initially just helping admins back up these configs. But that step could lead towards openshift/machine-config-operator#1720

@cgwalters
Copy link
Member

Also we discussed the need to basically ship ign-convert to translate spec2 to spec3 in order to aid people doing UPI installs - which argues I think for something more generic like oc|openshift-install ignition <subcommand> where <subcommand> would be inject or translate.

@hardys
Copy link
Contributor

hardys commented Oct 23, 2020

BTW one tangentially related idea I had is automatically pulling this "unmanaged Ignition" into the cluster by default. Basically the MCO reads the /run/ignition.json that the node fetched, and subtracts the MCO-managed Ignition to effectively regenerate the pointer config + per-node state.

@cgwalters - I've proposed a similar idea in the comments on #467 - but that would only add the per-role pointer ignition config (including any customizations) as a MachineConfig object.

I think the problem with the MCO reading the ignition config on each node, is that assumes something outside the MCO can perform the per-node customization action, and IMO it's not clear what that "something" should be in all cases, e.g:

  • For UPI it's probably some manual process, e.g similar to that described in this PR
  • For assisted-install per-node config could be generated in a similar way to UPI
  • For IPI we don't have any interface to provide per-host configuration IIUC?

Also there is the issue that for UPI/AI we have to maintain that state/config somewhere outside of the cluster, e.g some mapping of hosts to IPs or other per-host config.

I guess that's where the machineconfig like object that includes the node name comes in, but it would be great to explore that idea further, in particular if we could support creating such an object directly (e.g through manifests passed to the installer) vs requiring some external state and customization of ignition etc?

cgwalters added a commit to cgwalters/installer that referenced this pull request Oct 23, 2020
The transition to Ignition Spec 3 with 4.6 creates a
discontinuity.  Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be *required* for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config.
Since `openshift-install` already includes Ignition bits, let's
add translation capability here using
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
cgwalters added a commit to cgwalters/installer that referenced this pull request Oct 23, 2020
The transition to Ignition Spec 3 with 4.6 creates a
discontinuity.  Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be *required* for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config.
Since `openshift-install` already includes Ignition bits, let's
add translation capability here using
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
@cgwalters
Copy link
Member

openshift/installer#4300 adds translation, and left space for openshift-install ignition <something> - we'd want to debate whether that takes something like fcc files - see also coreos/butane#79

cgwalters added a commit to cgwalters/oc that referenced this pull request Oct 29, 2020
The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be required for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config
which is generated by `openshift-install`.  Since the idea is
`openshift-install` should in theory be disposable after a cluster
is provisioned, let's add this to `oc` which admins will need anyways.
Vendor and use
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
cgwalters added a commit to cgwalters/oc that referenced this pull request Oct 29, 2020
The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be required for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config
which is generated by `openshift-install`.  Since the idea is
`openshift-install` should in theory be disposable after a cluster
is provisioned, let's add this to `oc` which admins will need anyways.
Vendor and use
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
cgwalters added a commit to cgwalters/oc that referenced this pull request Nov 2, 2020
The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be required for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config
which is generated by `openshift-install`.  Since the idea is
`openshift-install` should in theory be disposable after a cluster
is provisioned, let's add this to `oc` which admins will need anyways.
Vendor and use
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
cgwalters added a commit to cgwalters/oc that referenced this pull request Nov 24, 2020
The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be required for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config
which is generated by `openshift-install`.  Since the idea is
`openshift-install` should in theory be disposable after a cluster
is provisioned, let's add this to `oc` which admins will need anyways.
Vendor and use
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
@jlebon
Copy link
Member

jlebon commented Dec 17, 2020

This is related to coreos/coreos-installer#124, in which we discuss ways to make customizing the install nicer at the coreos-installer level. Then the new tool here could leverage that.

@markmc
Copy link
Contributor

markmc commented Feb 16, 2021

I appreciate time has passed since this was written, and I can see how much of this idea is likely to evolve into new capabilities in coreos-installer - e.g. as described in coreos/butane#137

Some things occur to me, though ...

  • The introduction talks about the success of IPI in predictable environments, and the variability of less predictable environments requiring a level of pre-install customization. And I can see how the proposal improves the UPI experience in these environments, providing a scripted way to use the CoreOS installer ISO - you boot a customized ISO which can drive the install to completion, rather than booting a generic ISO that allows you to interactively complete the installation.
  • There is also the bare-metal IPI context (or more generically, IPI for less predictable environments?). In this context, we have options other than asking users to generate customized ISOs (i.e. the automation software we provide can do this) or even having to generate customized ISOs at all (i.e. by supplying the customizations to a generic image via a user-data-alike side-channel like the VMware guestinfo property)
  • We do already have a supportable way to embed networking configuration in an RHCOS disk image - i.e. NetworkManager keyfiles in /boot/coreos-firstboot-network as per https://access.redhat.com/solutions/5460671. A tool like this could help put some guard-rails around the image customizations we support. (But NB the difference between the installer ISO and an RHCOS disk image)
  • The assisted installer has a mechanism to include a bundle of network configs on the installer ISO, and select (based on MAC address) the appropriate config for the current machine. This is an attractive alternative to having to generate an ISO for every machine. (See ConfigStaticIpsScript)
  • If we are to put any sugar around host networking configuration, ITSM that an nmstate based file format makes the most sense

So, I could imagine something like:

  1. Syntax sugar for coreos-install iso ignition embed which could take embed nmstate networking config in the installer ISO, which then gets copied to the installed machine
  2. Preferably support a mode of embed-a-bundle-of-static-configs-and-choose-one-at-runtime like the assisted installer has demonstrate
  3. Support side-channel config mechanisms (e.g. config-drive with Metal3 virtual media, or VMware guestinfo property) so that this customization can be supplied without generating a custom installer ISO
  4. Agree on nmstate based APIs that IPI users (starting with bare-metal* and VMware?) can supply these per-cluster or per-machine customizations in install-config.yaml (for install) or in API resources (for new machines day 2)

Really only (1) and (2) relates to what is proposed here, but I think whatever we design for "a better UPI experience" should also pave the way towards solving these problems for "IPI in less predictable environments".

It would be great if https://github.com/openshift/enhancements/blob/master/enhancements/host-network-configuration.md was updated to reflect where things are at on this topic lately

(* I'm glossing over the detail here that bare-metal IPI with virtual media does not currently use the RHCOS installer ISO)

@cgwalters
Copy link
Member

I think a lot of this one has moved to coreos/butane#167

@bgilbert
Copy link

@cgwalters Nope, coreos/butane#167 is entirely different.

@jlebon
Copy link
Member

jlebon commented Mar 24, 2021

We have an epic internally on the CoreOS first boot team to implement the general idea from this enhancement into coreos-installer (re-using the existing functionality we have there): https://issues.redhat.com/browse/GRPA-3512

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2021
@bgilbert
Copy link

Closing in favor of https://issues.redhat.com/browse/GRPA-3512, for the reason mentioned in #492 (comment).

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 30, 2021

@bgilbert: Closed this PR.

In response to this:

Closing in favor of https://issues.redhat.com/browse/GRPA-3512, for the reason mentioned in #492 (comment).

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.