Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baremetal: Include CoreOS ISO in the release payload #909

Merged
merged 7 commits into from
Oct 26, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 260 additions & 0 deletions enhancements/baremetal/coreos-image-in-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
---
title: coreos-image-in-release
authors:
- "@zaneb"
reviewers:
- "@hardys"
- "@dtantsur"
- "@elfosardo"
- "@sadasu"
- "@kirankt"
- "@asalkeld"
- "@cgwalters"
- "@aravindhp"
- "@jlebon"
- "@dhellmann"
- "@sdodson"
- "@LorbusChris"
approvers:
- "@hardys"
- "@aravindhp"
creation-date: 2021-09-22
last-updated: 2021-09-22
status: implementable
see-also:
- "/enhancements/coreos-bootimages.md"
---

# Include the CoreOS image in the release for baremetal

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

The baremetal platform is switching from the OpenStack QCOW2 CoreOS image to
zaneb marked this conversation as resolved.
Show resolved Hide resolved
the live ISO (as also used for UPI). To ensure that existing disconnected
clusters can update to this, the ISO image will be included in the release
payload. This will be balanced out by removing the RHEL image currently shipped
in the release payload.

## Motivation

Currently, the deploy disk image (i.e. the image running IPA -
`ironic-python-agent`) is a RHEL kernel plus initrd that is installed (from an
RPM) into the `ironic-ipa-downloader` container image, which in turn is part of
the OpenShift release payload. When the metal3 Pod starts up, the disk image is
copied from the container to a HostPath volume whence it is available to
Ironic.

The target OS disk image is a separate CoreOS QCOW2 image. The URL for this is
known by the installer. It points to the public Internet by default and may be
customised by the user to allow disconnected installs. The URL is stored in the
Provisioning CR at install time and never updated automatically. The image
itself is downloaded once and permanently cached on all of the master nodes.
Never updating the image is tolerable because, upon booting, the CoreOS image
will update itself to the version matching the cluster it is to join. It
remains suboptimal because new Machines will take longer and longer (and more
and more bandwidth) to join as the cluster ages. This issue exists on all
zaneb marked this conversation as resolved.
Show resolved Hide resolved
platforms, and is the subject of a [long-standing enhancement
proposal](https://github.com/openshift/enhancements/pull/201). Other issues
specific to the baremetal platform are that boot times for bare metal servers
can be very long (and therefore the reboot is costly), and that support for
particular hardware may theoretically require a particular version of CoreOS.

We are changing the deploy disk image to use the same CoreOS images used for
UPI deployments. These take the form of both a live ISO (for hosts that can use
virtualmedia) and of a kernel + initrd + rootfs (for hosts that use PXE). When
upgrading an existing disconnected cluster, we currently have no way to acquire
these images without the user manually intervening to mirror them.

Like the QCOW2 provisioning disk image, the URLs for these images are known by
the installer, but they point to the cloud by default and would have to be
customised by the user at install time to allow disconnected installs.
Following a similar approach to that currently used with the QCOW2 also
effectively extends the limitation that we are not updating the provisioning OS
image to include the deploy image as well.

The agent itself (IPA) is delivered separately, in a container image as part of
the OpenShift release payload, so in any event we will continue to be able to
update IPA.

We wish to solve the problems with obtaining an up-to-date CoreOS by including
it in the release payload.

### Goals

* Ensure that no matter which version of OpenShift a cluster was installed
with, we are able to deliver updates to IPA and the OS it runs on.
zaneb marked this conversation as resolved.
Show resolved Hide resolved
* Stop maintaining and shipping the non-CoreOS, RHEL-based IPA PXE files.
* Never break existing clusters, even if they are deployed in disconnected
environments.

### Non-Goals

* Automatically switch pre-existing MachineSets to deploy with
`coreos-installer` instead of via QCOW2 images.
* Update the CoreOS QCOW2 image in the cluster with each OpenShift release.
* Provide CoreOS images for platforms other than baremetal.
* Eliminate the extra reboot performed to update CoreOS after initial
provisioning.

## Proposal

Build a container image containing the latest CoreOS ISO and the
`coreos-installer` binary. This container can be used e.g. as an init container
to make the ISO available where required. The `coreos-installer iso extract
pxe` command can also be used to produce the kernel, initrd, and rootfs from
the ISO for the purposes of PXE booting.

This image could either replace the existing content of the
`ironic-ipa-downloader` repo, be built from the new
`image-customization-controller` repo, or be built from a new repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think it'd be clearer if we created a new repo/image than re-purposing either of these?


### User Stories

As an operator of a disconnected cluster, I want to upgrade my cluster and have it to continue to work for provisioning baremetal machines.

As an operator of an OpenShift cluster, I want to add to my cluster new
hardware that was not fully supported in RHEL at the time I installed the
cluster.

As an operator of an OpenShift cluster, I want to ensure that the OS running on
hosts prior to them being provisioned as part of the cluster is up to date with
bug and security fixes.

### Implementation Details/Notes/Constraints

We will need to restore a [change to the Machine Config
Operator](https://github.com/openshift/machine-config-operator/pull/1792) to
allow working with different versions of Ignition that was [previously
reverted](https://github.com/openshift/machine-config-operator/pull/2126) but
should now be viable after [fixes to the
installer](https://github.com/openshift/installer/pull/4413).

The correct version of CoreOS to use is available from the openshift-installer;
this could be obtained by using the installer container image as a build image.

Because OpenShift container images are built in an offline environment, it is
not possible to simply download the image from the public Internet at container
build time. Instead this will be accomplished by *MAGIC*. Uploading the ISOs to
the lookaside cache in Brew is one possibility.
Copy link
Contributor

@hardys hardys Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan for resolving exactly how this process will work, and do we anticipate needing different solutions for upstream (openshift CI, not OKD) images vs downstream (and nightly) builds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to pull in some folks from the appropriate team (ART?) to tell us the right way to accomplish it.

Copy link
Member

@sosiouxme sosiouxme Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the good news is that there is a mechanism for ART to download and insert binaries like this if that is in fact what we determine to do. we would just need a script written to do it at the time we rebase the source.

the bad news is that it deepens the dependency tree of things that need to align in a release and falls outside of how we currently determine what needs building.

dependency tree: currently we have hyperkube building as an RPM, getting included in an RHCOS build, and that build then getting included in a nightly. this change would mean that for consistency, baremetal would also need to rebuild (after RHCOS) and be included in a nightly. there are lots of ways for that to go wrong and we would end up with mismatching content. i would say we could validate that before release but i'm actually not sure how we would, would have to think about it. in any case, there would likely be sporadic extended periods where the two were out of sync and we couldn't form a valid release. oh, and once we get embargoed RHCOS sorted out I have no idea how we'll get baremetal to match (admittedly a rare problem, perhaps a manual workaround will suffice).

detection that the image needs rebuilding: currently we check whether the source has changed, whether the config has changed, or whether RPM contents have changed, which are all things we can easily look up from the image or brew. there's nothing like that for other content. i guess to do this, the process that does the upload could add a label with the RHCOS buildid that's included, and we could then compare that to see if there's a newer build available. i don't look forward to adding that bit of complexity into the scan but it seems solvable.

some other things to consider:

  1. how does this work for OKD/CI builds?
  2. i assume we need this for all architectures? each arch is a different RHCOS build and that makes it a bit tricky to reuse the same dockerfile for all arches (when the content will need different arch-dependent names).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far baremetal only supports x86_64. But it seems inevitable that one day we'll have to support multiple architectures. Could we have one Dockerfile per arch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always keep this ISO the same as the one pinned in the installer.

So one idea: have this image derive from the installer. I think we'd get edge triggering for free then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config has baremetal building for all arches so i assumed those were used but it's simpler if not. we should probably limit those builds to x86...

our build system insists that a single multi-arch build reuse the same dockerfile for all (this is useful for ensuring they all have consistent content). there are ways around it, though, involving either complicating the dockerfile a bit, or (more likely) splitting the builds into a build per arch - we already have some like this for ARM64. so I guess this is not a blocker, just another complication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping the ISO same as installer seems like a good idea. that would mean it's not changing frequently.

i'm not quite sure how we'd implement that; i'm not sure how we'd determine what to download prior to the build, and we can't download during the build. except maybe from brew... heh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some other things to consider:

  1. how does this work for OKD/CI builds?

  2. i assume we need this for all architectures? each arch is a different RHCOS build and that makes it a bit tricky to reuse the same dockerfile for all arches (when the content will need different arch-dependent names).

1: For OKD, the ostree in machine-os-content is different than the one used in the bootimage (which is FCOS). For the okd-machine-os ostree, the hyperkube and client RPMs are extracted from the artifacts image, and layered onto FCOS: https://github.com/openshift/okd-machine-os/blob/master/Dockerfile.cosa#L9. Since the trees differ, a new node in OKD always has to pivot from FCOS to okd-machine-os before it even has a kubelet do run anything with.

2: This will be needed soon at least for ARM64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph on multi-arch to the doc.

@sosiouxme what additional information do you think we need to document here to get this to an approvable state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my concerns are answered.

ART can define a hook such that each time we build, we check if installer/data/data/rhcos-stream.json has changed. If it has, download the ISOs from the locations given there and make them available in the build context with an arch-specific name, probably something like rhcos.$arch.iso such that the same Dockerfile can just use the current arch to pick the right one in the arch-specific builds.


The RHCOS release pipeline may need to be adjusted to ensure that the latest
ISO is always available in the necessary location.

We will need to ensure that the container image is rebuilt whenever the RHCOS
image is updated.

OKD uses (rebuilt) Fedora CoreOS images instead of RHEL CoreOS, and this will
need to be taken into account. It may be that the build environment for OKD
allows for a more straightforward solution there (like downloading the image
directly).
zaneb marked this conversation as resolved.
Show resolved Hide resolved

### Risks and Mitigations

If the build pipeline does not guarantee the latest RHCOS version gets built
zaneb marked this conversation as resolved.
Show resolved Hide resolved
into the container image, then baremetal platform users may miss out on the
latest bug and security fixes.

## Design Details

### Open Questions [optional]

How will we provide up-to-date images to the container build?

Are there similar internet access restrictions on the build environment for
OKD?
zaneb marked this conversation as resolved.
Show resolved Hide resolved

### Test Plan

The expected SHA256 hashes of the ISO and PXE files are available metadata in
the cluster, so we should be able to verify at runtime that we have the correct
image.

### Graduation Criteria

#### Dev Preview -> Tech Preview

N/A

#### Tech Preview -> GA

N/A

#### Removing a deprecated feature

N/A

### Upgrade / Downgrade Strategy

The container registry will always contain an image with latest ISO. This will
get rolled out to the `image-customization-controller` pod, and future boots of
the deploy image will be based on the new ISO. The restart of ironic during the
upgrade should ensure that any BaremetalHosts currently booted into the deploy
image (i.e. not provisioned) will be rebooted into the new one.
zaneb marked this conversation as resolved.
Show resolved Hide resolved

For the initial release, pre-existing clusters will continue to provision with
QCOW2 images (but now via the new CoreOS-based IPA). Since the MachineSets will
not be automatically updated, everything will continue to work after
downgrading again (now via the old RHEL-based IPA).

Newly-installed clusters cannot be downgraded to a version prior to their
initial install version.

### Version Skew Strategy

The Cluster Baremetal Operator should ensure that the
`image-customization-controller` is updated before Ironic, so that reboots of
non-provisioned nodes triggered by the Ironic restart use the new image.

## Implementation History

N/A

## Drawbacks

Users with disconnected installs will end up mirroring the ISO as part of the
release payload, even if they are not installing on the baremetal platform and
thus will make no use of it. The extra data is substantially a duplicate of the
ostree data already stored in the release payload. While this is less than
ideal, the fact that this change allows us to remove the RHEL-based IPA image
already being shipped in the release payload means it is actually a net
improvement.

## Alternatives

We could use the prototype
[coreos-diskimage-rehydrator](https://github.com/cgwalters/coreos-diskimage-rehydrator)
to generate the ISO. This would allow us to generate images for other platforms
without doubling up on storage. However it would be much simpler to wait until
building images for other platforms is actually a requirement before
introducing this complexity. The ISO generating process is an implementation
detail from the user's perspective, so it can be modified when required.

We could attempt to generate an ISO from the existing ostree data in the
Machine Config. However, there is no known way to generate an ISO that is
bit-for-bit identical to the release, so this presents an unacceptably high
risk as the ISO used will not be the one that has been tested.

We could attempt to de-duplicate the data in the reverse direction, by having
the Machine Config extract its ostree from the ISO. This is theoretically
possible, but by no means straightforward. It could always be implemented at a
later date if required.

We could require users installing or upgrading disconnected clusters to
[manually mirror the ISO](https://github.com/openshift/enhancements/pull/879).

## Infrastructure Needed

We will need to have the latest RHCOS images available for download into a
container image in the build environment.

We may need a new repo to host the Dockerfile to build the container image
containing CoreOS.