-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Teach MCO to use the new-format image by default #3258
Conversation
Previously our machine-os-content images were just "wrappers" around ostree commits. With our new "layering" work, we have a new image format that is actually a working OCI container, but this type of image needs to be applied differently by rpm-ostree. This extends the RpmOstreeClient to support new layered images by: - capturing more metadata from rpm-ostree status - adding an additional rebase function specifically for the layered case - adding a function to determine whether or not a given image is a "layered"/bootable OCI image
We changed the interface for RpmOstreeClient, so we need to make sure the mocks are also up to date with the new signatures
Part of openshift/enhancements#1032 We'll add the new-format image into the payload alongside the old one until we can complete the transition. (There may actually be a separate `rhel-coreos-extensions` image e.g. too, so this is just the start) Note this PR is just laying groundwork; the new format container will not be used by default.
controllerconfig The extensions for our layered/bootable `rhel-coreos` images are no longer going to be contained within the image themselves. They will be a separate container. See: openshift/os#763 This makes sure that, if present, the new extensions image gets pushed through to controllerconfig and is available when merging machineconfigs.
The MCO orchestrates the management/setup of a big pile of different containers today, from `machine-os-content` to `haproxy` and configuring cri-o to use the correct `pause`/`pod` container. These images get set up at both "bootstrap" time and cluster time. In bootstrap today we manually scrape each image out of the CVO in shell script, and then pass them as CLI arguments inside `bootkube.sh`. This means adding new images requires a tedious "ratcheting" process where the CLI argument is added to the MCO, then we patch the installer to pass it in. Instead, add a new CLI argument which accepts a serialized imagestream object, which is exactly what the CVO carries. Prep for adding a new `rhel-coreos` image into the payload, so I can avoid that ratcheting process.
This updates the daemon's bootstrapping path so that the daemon can support updating to a new format/layered image while it is bootstrapping.
This modifies the daemon such that if a "new format" image is present in the MachineConfig as BaseOperatingSystemContainer, the daemon will use it instead of OSImageURL.
So once upon a time, our image references in our manifests, the ones that look like: `registry.svc.ci.openshift.org/openshift:something` probably pointed at real images. But: - that registry has long since retired and - the openshift client (`oc`) rewrites these values as though they were templated when an openshift release gets packed by `oc adm release new` This was not immediately apparent to someone who was not familiar with the process, and an outside observer could be forgiven for thinking that these were real values. This tries to alleviate any confusion or belief that these values are 'real' by making them obviously fake, but also descriptive of their function.
The way things work with the openshift client, once we add an image to image-references, `oc` expects it to be in the release, and `oc adm release new` will fail if it's not present. Also, in order for the placeholder image location in our `machine-config-osimageurl` configmap to get rewritten with a real value (rather than the bogus 'template' value it has by default), it has to be referenced in image-references. The desired outcome was "if it's there use it, otherwise ignore it", but that's just not possible with regard to the payload the way things are set up. This groups the addition of the new format os container and extensions container together into sort of an "on switch" commit that will break the builds if merged before https://issues.redhat.com/browse/ART-3883, but will make them work with new images if merged afterwards.
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jkyros The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
OK! Word on the street is that this may be unblocked. Basically I think we can now refer to |
Just going to lift draft and see what happens in CI... |
/retest |
OK, so something else still needs to happen. I think we need to find the place in the CI configuration to stitch in the production ART |
…tent` The former is going to replace the latter. See https://github.com/openshift/enhancements/blob/master/enhancements/ocp-coreos-layering.md I'm *hoping* this is somehow what's necessary to have the image that is now built by ART included in the CI payload, so that we can get this PR to work: openshift/machine-config-operator#3258 (comment)
randomly trying openshift/release#31083 |
The release payload name will be `rhel-coreos-8`, so let's have the CI config match. I think this may unblock openshift/machine-config-operator#3258 (comment) Closes: openshift/os#940
The release payload name will be `rhel-coreos-8`, so let's have the CI config match. I think this may unblock openshift/machine-config-operator#3258 (comment) Closes: openshift/os#940
/retest |
2 similar comments
/retest |
/retest |
@jkyros did you do a PR for openshift/installer to have it pass through the new image references? We'll need that in order for bootstrap to work, right? |
I believe we need to land c63533d first, then PR to the installer |
@jkyros: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@yuqi-zhang and I were talking about this and we think it makes sense to split this PR in two:
The first hunk of this could land right now with no external dependencies (e.g. on the installer etc.) |
We discussed this last week and I think we have reached consensus on these steps or so:
Does that still sound like the plan @jkyros |
Yep, that's still the plan. I am in the process of splitting this. |
|
@jkyros: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
return false, err | ||
} | ||
|
||
if isBootable, ok := imageData.Labels["ostree.bootable"]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this temporarily as a way to distinguish old vs new format, but I also think this check is better done in the ostree stack; done in ostreedev/ostree-rs-ext#356
Once we drop the old machine-os-content
and hard require new format, I don't think we need to duplicate the error we'd get from rpm-ostree anyways.
OK I took a crack at rebasing this myself, but man there are a ton of conflicts. Is doing that on your near term radar? If not I'll probably look at this later today |
Yes, I'll do that this afternoon. Most of this is already in in some fashion, what's left is really small. If we're going for shortest path to test "new format by default", I'm almost tempted to:
|
Closing this in favor of #3317 |
This is a way to give the MCO "new format" image support by:
BaseOperatingSystemContainer
andBaseOperatingSystemExtensionsContainer
all the way through into MachineConfig like OSImageURLI don't think this is actually how we'll want to do it, this is just the "simplest way" that doesn't "overload"
machine-os-content
with a new format image.Caveats:
rhel-coreos-8
(and optionallyrhel-coreos-8-extensions
) are included in the releaseCan I install a cluster using the new format image? Yes, but right now you probably don't want to, because you have to:
This PR is in service to: https://issues.redhat.com/browse/MCO-291