Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baremetal: Include CoreOS ISO in the release payload #909

Merged
merged 7 commits into from
Oct 26, 2021

Conversation

zaneb
Copy link
Member

@zaneb zaneb commented Sep 23, 2021

The baremetal platform is switching from a QCOW2 format for its CoreOS image to an ISO. To ensure that existing disconnected clusters can update to this, the ISO image will be included in the release payload. This will be balanced out by removing the RHEL image currently shipped in the release payload. There are many other nice benefits, such that after future upgrades clusters will provision new machines with the latest CoreOS, without requiring an extra reboot.

@dtantsur
Copy link
Member

/lgtm

Makes sense from the baremetal perspective

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2021

This image could either replace the existing content of the
`ironic-ipa-downloader` repo, be built from the new
`image-customization-controller` repo, or be built from a new repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think it'd be clearer if we created a new repo/image than re-purposing either of these?

Because OpenShift container images are built in an offline environment, it is
not possible to simply download the image from the public Internet at container
build time. Instead this will be accomplished by *MAGIC*. Uploading the ISOs to
the lookaside cache in Brew is one possibility.
Copy link
Contributor

@hardys hardys Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan for resolving exactly how this process will work, and do we anticipate needing different solutions for upstream (openshift CI, not OKD) images vs downstream (and nightly) builds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to pull in some folks from the appropriate team (ART?) to tell us the right way to accomplish it.

Copy link
Member

@sosiouxme sosiouxme Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the good news is that there is a mechanism for ART to download and insert binaries like this if that is in fact what we determine to do. we would just need a script written to do it at the time we rebase the source.

the bad news is that it deepens the dependency tree of things that need to align in a release and falls outside of how we currently determine what needs building.

dependency tree: currently we have hyperkube building as an RPM, getting included in an RHCOS build, and that build then getting included in a nightly. this change would mean that for consistency, baremetal would also need to rebuild (after RHCOS) and be included in a nightly. there are lots of ways for that to go wrong and we would end up with mismatching content. i would say we could validate that before release but i'm actually not sure how we would, would have to think about it. in any case, there would likely be sporadic extended periods where the two were out of sync and we couldn't form a valid release. oh, and once we get embargoed RHCOS sorted out I have no idea how we'll get baremetal to match (admittedly a rare problem, perhaps a manual workaround will suffice).

detection that the image needs rebuilding: currently we check whether the source has changed, whether the config has changed, or whether RPM contents have changed, which are all things we can easily look up from the image or brew. there's nothing like that for other content. i guess to do this, the process that does the upload could add a label with the RHCOS buildid that's included, and we could then compare that to see if there's a newer build available. i don't look forward to adding that bit of complexity into the scan but it seems solvable.

some other things to consider:

  1. how does this work for OKD/CI builds?
  2. i assume we need this for all architectures? each arch is a different RHCOS build and that makes it a bit tricky to reuse the same dockerfile for all arches (when the content will need different arch-dependent names).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far baremetal only supports x86_64. But it seems inevitable that one day we'll have to support multiple architectures. Could we have one Dockerfile per arch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always keep this ISO the same as the one pinned in the installer.

So one idea: have this image derive from the installer. I think we'd get edge triggering for free then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config has baremetal building for all arches so i assumed those were used but it's simpler if not. we should probably limit those builds to x86...

our build system insists that a single multi-arch build reuse the same dockerfile for all (this is useful for ensuring they all have consistent content). there are ways around it, though, involving either complicating the dockerfile a bit, or (more likely) splitting the builds into a build per arch - we already have some like this for ARM64. so I guess this is not a blocker, just another complication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping the ISO same as installer seems like a good idea. that would mean it's not changing frequently.

i'm not quite sure how we'd implement that; i'm not sure how we'd determine what to download prior to the build, and we can't download during the build. except maybe from brew... heh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some other things to consider:

  1. how does this work for OKD/CI builds?

  2. i assume we need this for all architectures? each arch is a different RHCOS build and that makes it a bit tricky to reuse the same dockerfile for all arches (when the content will need different arch-dependent names).

1: For OKD, the ostree in machine-os-content is different than the one used in the bootimage (which is FCOS). For the okd-machine-os ostree, the hyperkube and client RPMs are extracted from the artifacts image, and layered onto FCOS: https://github.com/openshift/okd-machine-os/blob/master/Dockerfile.cosa#L9. Since the trees differ, a new node in OKD always has to pivot from FCOS to okd-machine-os before it even has a kubelet do run anything with.

2: This will be needed soon at least for ARM64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph on multi-arch to the doc.

@sosiouxme what additional information do you think we need to document here to get this to an approvable state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my concerns are answered.

ART can define a hook such that each time we build, we check if installer/data/data/rhcos-stream.json has changed. If it has, download the ISOs from the locations given there and make them available in the build context with an arch-specific name, probably something like rhcos.$arch.iso such that the same Dockerfile can just use the current arch to pick the right one in the arch-specific builds.

@hardys
Copy link
Contributor

hardys commented Sep 27, 2021

Overall this looks good to me, and I'm really pleased to see progress on this topic :)

Few minor comments, but the general direction all sounds good to me - it'd be good to get an ack from @cgwalters and @jlebon before approving though

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2021
@sadasu
Copy link
Contributor

sadasu commented Sep 27, 2021

Overall a pretty detailed enhancement. Would be nice if someone from the ART team helps demystify the MAGIC.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me!

Copy link
Member

@sosiouxme sosiouxme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Co-authored-by: Luke Meyer <sosiouxme@gmail.com>
@zaneb
Copy link
Member Author

zaneb commented Oct 21, 2021

/label tide/merge-method-squash
/assign @hardys

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 21, 2021
@cgwalters
Copy link
Member

/label tide/merge-method-squash

(Neat, I didn't know that was a thing)

@zaneb
Copy link
Member Author

zaneb commented Oct 21, 2021

(Neat, I didn't know that was a thing)

We haven't seen if will work yet 😆

@hardys
Copy link
Contributor

hardys commented Oct 26, 2021

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 26, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, hardys, jlebon, sosiouxme

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2021
Copy link

@kirankt kirankt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2021
@sadasu
Copy link
Contributor

sadasu commented Oct 26, 2021

/lgtm

@openshift-merge-robot openshift-merge-robot merged commit 4dc727e into openshift:master Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.