Embed RHCOS buildID in update payload, fetch it for installer #987

cgwalters · 2019-01-03T16:33:33Z

There was a lot of discussion in #757 which is long since closed that I don't want to be lost.

One specific proposal I'd like to discuss here is embedding the RHCOS build ID (e.g. 47.201) in the update payload.

In order to do this, as was discussed in the above PR, the installer would need to at least do the equivalent of skopeo inspect. A nice benefit of this would be that we'd validate the pull secret up front instead of on the bootstrap node.

A major benefit of this is conceptually one only needs to worry about two toplevel things when speaking of git master - (installer, update payload).

And released installers wouldn't need the special commit to do the RHCOS pinning - it would happen naturally as part of pinning a release payload.

The text was updated successfully, but these errors were encountered:

cgwalters · 2019-01-03T16:33:58Z

And we'd solve the "RHCOS CI gating problem" if updating the build ID into the release payload was a PR gated by e2e-aws.

cgwalters · 2019-01-03T17:26:41Z

(Clearly if we do this the installer should also resolve the release image to a by-digest @sha256 and pass that to the bootstrap)

wking · 2019-01-03T20:25:53Z

A nice benefit of this would be that we'd validate the pull secret up front instead of on the bootstrap node.

Just for the update payload though. There would be no validation of whether your pull secret was authorized to pull the (indirectly) referenced images. Still, checking the pull secret against the update payload is certainly not hurting validation, even if it is incomplete ;).

A major benefit of this is conceptually one only needs to worry about two toplevel things when speaking of git master - (installer, update payload).

And not even that, since you can use the installer referenced by the update payload since openshift/origin#21637.

I'm on board with this approach. How did you plan on getting the RHCOS build ID (and channel) into the update payload? I'd prefer if it was via a repository maintained by the RHCOS team, because we want the installer's master to continue to float with the latest RHCOS, and that would be a noisy stream of machine-generated PRs ;).

cgwalters · 2019-01-03T21:16:53Z

OK openshift/origin#21637 is...confusing me a lot. I understand the Hive connection somehow but...for the ("non-Hive"?) install path there are no plans to switch away from downloading a binary or building git master right?

How did you plan on getting the RHCOS build ID (and channel) into the update payload?

Two options I see:

release-operator gathers the latest - so we're gating on the release validation
PR to https://github.com/openshift/os

I think my vote is the former; this leaves us in a place where e.g. a broken cri-o will stall release image promotion, but I think that's OK for now. The RHCOS team can invest more in better CI even before things land into git master for us after that.

cgwalters · 2019-01-03T22:32:22Z

If we're agreed on this then the next step is to discuss implementation architecture on the installer side a bit - I'm willing to try helping here.

A dependency on the containers/image library would be OK as far as we know?

wking · 2019-01-03T23:02:52Z

... for the ("non-Hive"?) install path there are no plans to switch away from downloading a binary or building git master right?

It's a bit fuzzy for me, but I'm still planning on cutting installer releases. While we're doing that, users can do any of:

Use a released installer binary (or build from a release tag with its pinned dependencies) to launch clusters pinned to that baked-in update payload (and RHCOS).
Use an installer imaged referenced from an update payload to launch a cluster from that same update payload (and once we fix this issue, with the RHCOS it references as well).
Use an installer built from master to launch the most recently promoted update payload and the most recently promoted RHCOS. With this approach, it doesn't really matter if the installer pulls "the most recently promoted RHCOS" information from the update payload (as in this issue) or some RHCOS-specific API (as we currently do).

And with all of these approaches, users can go through some hoop-jumping to override the update payload or RHCOS build if they want to test a change (like we do in CI).

Two options I see:

release-operator gathers the latest - so we're gating on the release validation

PR to https://github.com/openshift/os

I think my vote is the former...

~~We certainly don't want to bury openshift/os in RHCOS-bump PRs.~~ [edit: I'd misread this as openshift/origin. I'm fine burying openshift/os in RHCOS-bump PRs ;).]

... this leaves us in a place where e.g. a broken cri-o will stall release image promotion...

A third option would be to have a separate repository (recycling openshift/os?) that builds this into the release payload just like all of the other repositories getting built into the release payload. That has the "one more repo to manage" downside but the "breaking RHCOS-bump just blocks one repo (vs. the whole update payload promotion stream)" upside. [edit: Why would openshift/os PRs stall release-image promotion? The bumping PR should be blocked from merging by CI, just like PRs to other repositories.]

The RHCOS team can invest more in better CI even before things land into git master for us after that.

Yeah, this would mitigate the promotion-blocking issue.

A dependency on the containers/image library would be OK as far as we know?

That's fine with me. We already depend on them for Podman/CRI-O, so vendoring them here doesn't increase our overall dependency exposure.

cgwalters · 2019-01-03T23:12:53Z

so vendoring them here doesn't increase our overall dependency exposure.

Yeah, only concern I can think of though is that we'd need it to work on OS X at least, which shouldn't be hard - particularly if we e.g. had a build tag that that entirely disabled the storage backends for example since we don't need to unpack the containers.

cgwalters · 2019-01-03T23:15:23Z

Use an installer imaged referenced from an update payload to launch a cluster from that same update payload (and once we fix this issue, with the RHCOS it references as well).

This is "Use Hive" (and really only Hive) right? We don't have other use cases where people are unpacking the update payload to get an installer, or do we?

wking · 2019-01-03T23:33:16Z

Use an installer imaged referenced from an update payload to launch a cluster from that same update payload (and once we fix this issue, with the RHCOS it references as well).

This is "Use Hive" (and really only Hive) right? We don't have other use cases where people are unpacking the update payload to get an installer, or do we?

I dunno. But does consumer enumeration matter? It's a valid approach with our current and foreseeable future update-payloads, and so are the other two approaches.

crawford · 2019-01-05T00:17:22Z

I'm against this idea right now. The long term plan is to move master creation into an actuator (which will mean that all of the machines are created dynamically). Once that happens, I think this will make a lot more sense to tackle.

As things exist today, if the installer creates a master with an old RHCOS image, will the MCO take a pass and get it updated? If not, that needs to be fixed long before we should be discussing this.

cgwalters · 2019-01-05T00:31:42Z

I'm against this idea right now. The long term plan is to move master creation into an actuator (which will mean that all of the machines are created dynamically). Once that happens, I think this will make a lot more sense to tackle.

Seems weird, you're saying "not right now but would be good for the future" - why explicitly then the "not right now" part? Complexity? Timing/need-to-ship-what-we-have?

As things exist today, if the installer creates a master with an old RHCOS image, will the MCO take a pass and get it updated? If not, that needs to be fixed long before we should be discussing this.

This is openshift/machine-config-operator#183

We need both I think though - they're not quite orthogonal because if we get in a situation where the installer creates a cluster with "bootimage" != "osimage", that would be quite noticeable for both users and CI scenarios - we'd be rebooting all of the nodes (including the master) soon after cluster creation.

cgwalters · 2019-01-22T21:06:31Z

One thing that's fairly important and would be solved with this is that today there's a dependency on the rhcos release API server which...isn't really maintained to the level it should be to be a production service.

For AWS in particular all we need is the "AMI json" which if we just embedded in the release payload would mean the critical path wouldn't need a dependency on the service.

(We still need the service for libvirt but that's not production. We will need the service too or something like it for bare metal/openstack etc.)

Strawman proposal:

oc adm release new --embed-rhcos=https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/

Maybe for maximum convenience for the installer this data ends up in the container metadata (not the content payload) so that it doesn't need to be extracted.

For RHCOS we have two things: - The "bootimage" (AMI, qcow2, PXE env) - The "oscontainer", now represented as `machine-os-content` in the payload For initial OpenShift releases (e.g. of the installer) ideally these are the same (i.e. we don't upgrade OS on boot). This PR aims to support injecting both data into the release payload. More information on the "bootimage" and its consumption by the installer as well as the Machine API Operator: openshift/installer#987 More information on `machine-os-content`: openshift/machine-config-operator#183

cgwalters · 2019-02-09T19:06:06Z

openshift/origin#21998 will land metadata in the release payload sufficient for the installer to use to implement this.

abhinavdahiya · 2019-11-05T16:16:56Z

The installer binaries shipped to customers have embedded fixed release image digest, and also the installer binary itself has the boot-images embedded.

So the in terms of users, they only care about one thing and the rest is already handled. So I think issue can be closed as fixed.

/close

If you think that's not the case, please feel free to reopen.

openshift-ci-robot · 2019-11-05T16:17:07Z

@abhinavdahiya: Closing this issue.

In response to this:

The installer binaries shipped to customers have embedded fixed release image digest, and also the installer binary itself has the boot-images embedded.

So the in terms of users, they only care about one thing and the rest is already handled. So I think issue can be closed as fixed.

/close

If you think that's not the case, please feel free to reopen.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters mentioned this issue Jan 3, 2019

Pin OS image for release builds #757

Merged

cgwalters mentioned this issue Jan 8, 2019

WIP: Add RHCOS oscontainer into payload, render to 00-$role-osimageurl MC openshift/machine-config-operator#273

Closed

2 tasks

wking mentioned this issue Jan 8, 2019

4.0 arch draft openshift/openshift-docs#12880

Merged

wking mentioned this issue Jan 31, 2019

pkg/rhcos/builds: allow use of env var to override stream URL. #1168

Closed

cgwalters mentioned this issue Feb 8, 2019

WIP: add --coreos-url to oc adm release new openshift/origin#21998

Closed

wking mentioned this issue Feb 21, 2019

pkg/rhcos/release: Extract RHCOS build from release image #1286

Closed

cgwalters mentioned this issue Aug 1, 2019

Updating bootimages openshift/os#381

Closed

openshift-ci-robot closed this as completed Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed RHCOS buildID in update payload, fetch it for installer #987

Embed RHCOS buildID in update payload, fetch it for installer #987

cgwalters commented Jan 3, 2019 •

edited

Loading

cgwalters commented Jan 3, 2019

cgwalters commented Jan 3, 2019

wking commented Jan 3, 2019

cgwalters commented Jan 3, 2019

cgwalters commented Jan 3, 2019

wking commented Jan 3, 2019 •

edited

Loading

cgwalters commented Jan 3, 2019 •

edited

Loading

cgwalters commented Jan 3, 2019

wking commented Jan 3, 2019

crawford commented Jan 5, 2019

cgwalters commented Jan 5, 2019

cgwalters commented Jan 22, 2019 •

edited

Loading

cgwalters commented Feb 9, 2019

abhinavdahiya commented Nov 5, 2019

openshift-ci-robot commented Nov 5, 2019

Embed RHCOS buildID in update payload, fetch it for installer #987

Embed RHCOS buildID in update payload, fetch it for installer #987

Comments

cgwalters commented Jan 3, 2019 • edited Loading

cgwalters commented Jan 3, 2019

cgwalters commented Jan 3, 2019

wking commented Jan 3, 2019

cgwalters commented Jan 3, 2019

cgwalters commented Jan 3, 2019

wking commented Jan 3, 2019 • edited Loading

cgwalters commented Jan 3, 2019 • edited Loading

cgwalters commented Jan 3, 2019

wking commented Jan 3, 2019

crawford commented Jan 5, 2019

cgwalters commented Jan 5, 2019

cgwalters commented Jan 22, 2019 • edited Loading

cgwalters commented Feb 9, 2019

abhinavdahiya commented Nov 5, 2019

openshift-ci-robot commented Nov 5, 2019

cgwalters commented Jan 3, 2019 •

edited

Loading

wking commented Jan 3, 2019 •

edited

Loading

cgwalters commented Jan 3, 2019 •

edited

Loading

cgwalters commented Jan 22, 2019 •

edited

Loading