pkg/rhcos/release: Extract RHCOS build from release image #1286

wking · 2019-02-21T12:37:57Z

Since e2b31b2 (#1149), we have been using the machine-os-content image to seed the machine-config operator. With this commit, use the RHCOS build ID from that image's annotations to calculate our AMI, etc. as well. This gives one less degree of freedom for breaking things ;). Users who want to test clusters based on a different RHCOS build should bump the value in their update payload, just like users testing operator updates and other changes.

The vendor commit is depressingly large, but I don't think there's much we can do about that from this repository. See the e189de1 commit message for notes.

This is a WIP until I look into signature verification.

I'm not sure where this stands with openshift/origin#21998 deferred, but I was looking into ways to simplify the release process and this was one of them. CC @cgwalters

Fixes #987.

openshift-ci-robot · 2019-02-21T12:38:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2019-02-21T12:56:43Z

An additional benefit of this approach is that invalid pull-secrets will result in installer-host errors like:

FATAL failed to fetch Bootstrap Ignition Config: failed to fetch dependency of "Bootstrap Ignition Config": failed to fetch dependency of "Master Machines": failed to generate asset "Image": Error reading manifest 4.0.0-0.nightly-2019-02-20-194411 in registry.svc.ci.openshift.org/ocp/release: unauthorized: authentication required

For previous discussion of that issue, see #901. And this reminds me that I need to wire in the user-supplied pull-secret for these fetches ;).

cgwalters · 2019-02-21T13:30:08Z

I only skimmed this but it looks good so far!

One thing I'd add to this...for release payloads let's be prepared to bake in the AMI metadata - don't fetch it over HTTP.

It'd probably be just changing code around here to parse a constant JSON blob embedded in the code or so.

abhinavdahiya · 2019-02-21T14:07:42Z

/hold

This change is to big to be included after freeze.

cgwalters · 2019-02-21T14:58:34Z

This change is to big to be included after freeze.

Fair enough, but let's at least switch to hardcoding the AMI data for releases to drop the dependency on the "RHCOS build metadata" server?

wking · 2019-02-21T16:53:24Z

Fair enough, but let's at least switch to hardcoding the AMI data for releases to drop the dependency on the "RHCOS build metadata" server?

We'd still rely on that server for libvirt QCOW2 and whatever OpenStack uses, right? Can we put that JSON blob in the machine-os-content?

cgwalters · 2019-02-21T17:23:15Z

Can we put that JSON blob in the machine-os-content?

openshift/origin#21998 (comment)

pkg/asset/release/release.go

pkg/rhcos/updatepayload/updatepayload.go

abhinavdahiya · 2019-02-21T22:33:53Z

pkg/rhcos/updatepayload/updatepayload.go

+		return nil, nil, err
+	}
+	defer func() {
+		if err != nil {


this is confusing.. which err are we trying to check...?

this is confusing.. which err are we trying to check...?

If FromUnparsedImage (or the coming signature validation) fails, I want to be sure we close imageSource. So this closure looks at the err that will be returned by pull, and, if it's not nil, closes imageSource for us. That way pull callers can just return the error without having to worry about checking imageSource for nil-ness and possibly closing it themselves; they can just return pull's err.

pkg/rhcos/updatepayload/updatepayload.go

… image A temporary workaround while [1] is in flight. [1]: openshift#1286

mtrmac

(Just a few comments on the use of c/image API.)

pkg/rhcos/release/release.go

mtrmac · 2019-03-11T20:00:44Z

pkg/rhcos/release/release.go

+		defer blob.Close()
+
+		// fmt.Println(layer.MediaType) // TODO: blank from quay?  Just assume this is a gzipped tar?
+		gzipReader, err := gzip.NewReader(blob)


Consider c/image/pkg/compression.DetectCompression. Yes, there is a MIME type in the manifest, but historically it’s not been used to differentiate between compressed/uncompressed in practice.

Yes, there is a MIME type in the manifest...

Fix the broken servers! ;) In practice, these pulls are going to be from release images that we cut with oc adm release new ... and the layer is going to be gzipped, but sure, I can reroll to use peek-inside detection.

Sadly, it’s not just the servers; the schema2 spec does not officially specify an “uncompressed tar” MIME type, that’s new in OCI (although there is a de-facto schema2 definition in containerd).

Actually, …/compression.AutoDecompress would probably be even more convenient.

I agree that in practice pretty much every image is compressed, and this may not be worth worrying about. Only noting that this exists because I saw that TODO.

FWIW there is work ongoing to add new compression types like Zstd, so hard-coding gzip here will eventually break.

Since e2b31b2 (bootkube: Supply machine-os-content to MCO, 2019-01-29, openshift#1149), we have been using the machine-os-content image to seed the machine-config operator. With this commit, use the RHCOS build ID from that image's annotations to calculate our AMI, etc. as well. This gives one less degree of freedom for breaking things ;). Users who want to test clusters based on a different RHCOS build should bump the value in their release image, just like users testing operator updates and other changes. The new pkg/asset/release subpackage allows users to continue using pkg/rhcos without pulling in all of containers/image as a dependency. The pull-secret handling is a bit of a hack, leaning on the fact that users are likely providing clean secrets from [1]. Hopefully soon containers/image will grow an API for injecting in-memory bytes into their more-robust Docker-auth-config parser, but my attempt at that [2] is unlikely to land in the next few days, so I've cludged together a minimal implementation here. [1]: https://cloud.openshift.com/clusters/install#pull-secret [2]: containers/image#588

And zounds of other dependencies. Use Gopkg.toml's 'ignored' [1] to exclude, BoltDB, which comes in via: github.com/containers/image/pkg/blobinfocache github.com/boltdb/bolt and github.com/docker/distribution/registry/storage/cache, which comes in via: github.com/containers/image/docker github.com/docker/distribution/registry/client github.com/docker/distribution/registry/storage/cache because we don't use either the storage backend or the BoltDB blob-info cache, but dep isn't checking at that level of granularity. Ideally the upstream repositories would restructure to split these out into separate packages (and/or dep will grow support for pruning by build tag [2]), but until then, we can manually prune via 'ignored'. Also ignore mtrmac/gpgme, because we only need signature verification and we don't want to use CGO (because we want to be portable to other operating systems, and we only need verification support, not signing support [3]). There may be more fat we can prune as well, but I got tired of looking and gave up, even though ~50k lines of new code is pretty embarassing for what is effectively just a handful of HTTPS requests and an OpenPGP check. Generated (after manually editing Gopkg.toml) with: $ dep ensure using: $ dep version dep: version : v0.5.0-31-g73b3afe build date : 2019-02-08 git hash : 73b3afe go version : go1.10.3 go compiler : gc platform : linux/amd64 features : ImportDuringSolve=false [1]: https://golang.github.io/dep/docs/Gopkg.toml.html#ignored [2]: golang/dep#291 [3]: containers/image#268

wking · 2019-03-12T16:18:00Z

I've pushed c6be68c -> c41e524, rebasing onto master and dropping the runc ignore, because we apparently do need runc's homedir library.

Clayton pushed release:4.0.0-0.nightly-2019-03-04-234414 to quay.io/openshift-release-dev/ocp-release:4.0.0-0.7. Extracting the associated RHCOS build: $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.7 | grep machine-os-content machine-os-content quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 $ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 | grep version version=47.330 I'd like to be pulling that version out of the release image at install-time [1], but that PR isn't green yet, so hard-coding here is a quick fix. [1]: openshift#1286

This is the most recent RHCOS: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/builds.json | jq '{"latest": .builds[0], timestamp}' { "latest": "400.7.20190306.0", "timestamp": "2019-03-06T22:24:53Z" } I'd prefer to pull this from the release image: $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.7 | grep machine-os-content machine-os-content quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 $ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 | grep version version=47.330 But [1] isn't green yet, so hard-coding here is a quick fix. Supported regions: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/400.7.20190306.0/meta.json | jq -r '.amis[] | .name' ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2 [1]: openshift#1286

openshift-ci-robot · 2019-03-15T19:22:15Z

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2019-03-26T13:05:19Z

I think this PR is a good idea - it seems like we made an architectural decision not to do it right now though.

One thing I would say though is it seems quite likely at some point we'll make a stronger push for "cluster managed bootimages" - https://url.corp.redhat.com/6f27cd4 "Lifecycle of RHCOS image source". If we start to implement that more ambitious path, it may involve something like having the bootstrap node set up the bootimage (i.e. the bootimage for the bootstrap would be different from master/worker) potentially. That may lead us down a different architectural path.

cgwalters · 2019-03-26T23:32:06Z

One potential blocker here is that going forward it's not clear to me that we'll ship new bootimages for every machine-os-content. That's currently true, but we are thinking about changing it - it would speed up the builds rather dramatically, particularly since we keep adding new bootimage types.

We also know that we cannot support updating the bootimage in every environment right now; we need to deal with pivoting.

The argument is it's better to always pivot so it's tested.

These two combined say that even if we extract the RHCOS metadata from the release image, we should support a separate "bootimage" annotation or so.

wking · 2019-03-26T23:55:51Z

One potential blocker here is that going forward it's not clear to me that we'll ship new bootimages for every machine-os-content.

That's fine, just don't bump the embedded bootimage labels.

That's currently true, but we are thinking about changing it - it would speed up the builds rather dramatically, particularly since we keep adding new bootimage types.

I think prioritizing RHCOS-build-speed (affects devs) over cluster-install-speed (affects users) is slanting things the wrong way ;).

We also know that we cannot support updating the bootimage in every environment right now; we need to deal with pivoting.

We definitely want this to work. But surely we can get sufficient soak time on pivots in CI without pushing them onto users too?

These two combined say that even if we extract the RHCOS metadata from the release image, we should support a separate "bootimage" annotation or so.

Yeah, and since AMI IDs, etc., are only used for boot images and com.coreos.ostree-commit is only ever used for pivots, we're fine, right? This doesn't seem like something we could possibly mix together.

ericavonb · 2019-03-27T01:09:15Z

That's currently true, but we are thinking about changing it - it would speed up the builds rather dramatically, particularly since we keep adding new bootimage types. I think prioritizing RHCOS-build-speed (affects devs) over cluster-install-speed (affects users) is slanting things the wrong way ;). But don't forget that faster builds has compounding returns :)

faster builds -> sorter iteration times -> quicker bug fixes -> less time blocked on bugs -> more time on features and improvements (incl -> faster builds) faster builds -> sorter iteration times -> faster feature development -> more user impacting improvements

wking · 2019-03-27T04:07:21Z

faster builds -> sorter iteration times -> quicker bug fixes...

Right, and if you just want to iterate on the post-pivot OSTree, you can do that now by skipping AMI, etc. builds and just tweaking com.coreos.ostree-commit in the the machine-os-content image. And if you just want to iterate on the pre-pivot bootimage, you'll be able to do that by building a single AMI (or whatever) to match your test environment and only tweaking that. Or iterate on both at the same time, whatever. But do we really expect to be pushing RHCOS updates out to end users so fast that out build pipeline can't keep them in fresh buildimages to allow them to avoid an initial pivot? But whatever, pulling this information from machine-os-content annotations will allow folks to wire this together for iteration however they like, and those wiring decisions will not be up to me ;).

cgwalters · 2019-03-27T12:57:27Z

I think prioritizing RHCOS-build-speed (affects devs) over cluster-install-speed (affects users) is slanting things the wrong way ;).

It's not just about speeding up builds - let's say image uploads for 1 region in a public cloud like AWS/Azure are broken - do we stall everything on that? No, we shouldn't need to.

mtrmac · 2019-04-30T12:16:14Z

pkg/rhcos/release/release.go

+	layers := image.LayerInfos()
+	for i := len(layers) - 1; i >= 0; i-- {
+		layer := layers[i]
+		blob, _, err := imageSource.GetBlob(ctx, layer, blobinfocache.NoCache)


If this is expected to enforce/verify signatures (as the code does by calling IsRunningImageAllowed in pull), this code must enforce that the blob stream matches the digest in layer.Digest; otherwise the registry could substitute a malicious blob not matching the manifest.

(IsRunningImageAllowed can’t do that for you: it would have to download the blobs, verify them, and cache them somewhere, so that later calls to GetBlob can read the cached version to ensure it has not been changed; the layering/responsibilities are allocated the opposite way, which allows the caller to stream the data directly with no storage requirements, as long as it handles the verification.

GetBlob also does not do digest verification; without, again, requiring extra storage space to cache the data, it could in principle return an error at EOF, but that would only help consumers that don’t consume the data as being streamed, just the way this code does. Also, GetBlob is implemented in all the different transports, and having every one of them copy&paste code to verify digests would be hard to maintain and enforce; centralizing this in the consumer, typically c/image/copy.Image is more practical.)

openshift-ci-robot · 2019-05-15T19:38:05Z

@wking: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/govet	`c41e524`	link	`/test govet`
ci/prow/unit	`c41e524`	link	`/test unit`
ci/prow/images	`c41e524`	link	`/test images`
ci/prow/e2e-aws-upgrade	`c41e524`	link	`/test e2e-aws-upgrade`
ci/prow/e2e-aws	`c41e524`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2019-07-24T22:15:49Z

closing due to inactivity. Please reopen if needed.

/close

openshift-ci-robot · 2019-07-24T22:15:52Z

@abhinavdahiya: Closed this PR.

In response to this:

closing due to inactivity. Please reopen if needed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 21, 2019

openshift-ci-robot requested review from abhinavdahiya and tomassedovic February 21, 2019 12:38

openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 21, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2019

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2019

abhinavdahiya reviewed Feb 21, 2019

View reviewed changes

pkg/asset/release/release.go Outdated Show resolved Hide resolved

abhinavdahiya reviewed Feb 21, 2019

View reviewed changes

pkg/rhcos/updatepayload/updatepayload.go Outdated Show resolved Hide resolved

abhinavdahiya reviewed Feb 21, 2019

View reviewed changes

pkg/rhcos/updatepayload/updatepayload.go Outdated Show resolved Hide resolved

This was referenced Feb 27, 2019

RFE: add versioning to pull-secret file and check at installer #1321

Closed

rhcos: Pin to previous version #1325

Closed

wking added a commit to wking/openshift-installer that referenced this pull request Feb 27, 2019

scripts/install-release-image: Extract the RHCOS AMI from the release…

ad4d7c0

… image A temporary workaround while [1] is in flight. [1]: openshift#1286

wking mentioned this pull request Feb 27, 2019

scripts/install-release-image: Helper for launching from an install payload #1221

Closed

wking added a commit to wking/openshift-installer that referenced this pull request Feb 27, 2019

scripts/install-release-image: Extract the RHCOS AMI from the release…

4f8bac2

… image A temporary workaround while [1] is in flight. [1]: openshift#1286

wking added a commit to wking/openshift-installer that referenced this pull request Feb 27, 2019

scripts/install-release-image: Extract the RHCOS AMI from the release…

2e71784

… image A temporary workaround while [1] is in flight. [1]: openshift#1286

wking added a commit to wking/openshift-installer that referenced this pull request Feb 27, 2019

scripts/install-release-image: Extract the RHCOS AMI from the release…

34e4a3e

… image A temporary workaround while [1] is in flight. [1]: openshift#1286

wking added a commit to wking/openshift-installer that referenced this pull request Feb 27, 2019

scripts/install-release-image: Extract the RHCOS AMI from the release…

9916732

… image A temporary workaround while [1] is in flight. [1]: openshift#1286

wking force-pushed the rhcos-build-from-update-payload branch from e189de1 to c1d35cc Compare March 8, 2019 09:06

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 8, 2019

wking force-pushed the rhcos-build-from-update-payload branch 2 times, most recently from 3a99f85 to 327976e Compare March 8, 2019 09:15

mtrmac reviewed Mar 11, 2019

View reviewed changes

wking mentioned this pull request Mar 11, 2019

rhcos: Support channels #1402

Closed

wking force-pushed the rhcos-build-from-update-payload branch from c6be68c to c41e524 Compare March 12, 2019 16:17

abhinavdahiya mentioned this pull request Mar 12, 2019

RFE: openshift-install download-bootimage #1399

Closed

wking mentioned this pull request Mar 12, 2019

hack/build: Pin to RHCOS 400.7.20190306.0 #1407

Merged

abhinavdahiya mentioned this pull request Mar 12, 2019

ci-operator/config/openshift/installer/openshift-installer-master.yaml: Set containers_image_openpgp openshift/release#3078

Closed

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2019

wking mentioned this pull request Mar 20, 2019

rhcos: Embed full build metadata in binary #1423

Merged

wking mentioned this pull request Apr 17, 2019

AWS UPI draft openshift/openshift-docs#14241

Merged

mtrmac reviewed Apr 30, 2019

View reviewed changes

openshift-ci-robot closed this Jul 24, 2019

cgwalters mentioned this pull request Aug 1, 2019

Updating bootimages openshift/os#381

Closed

cgwalters mentioned this pull request Dec 2, 2019

baremetal: RHCOS image for bootstrap needs to be able to downloaded disconnected #2597

Closed

wking mentioned this pull request Oct 21, 2021

Allow installer to include/exclude capabilities based on user selections openshift/enhancements#922

Merged

wking mentioned this pull request Oct 31, 2024

Split RHCOS into layers openshift/enhancements#1637

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/rhcos/release: Extract RHCOS build from release image #1286

pkg/rhcos/release: Extract RHCOS build from release image #1286

wking commented Feb 21, 2019

openshift-ci-robot commented Feb 21, 2019

wking commented Feb 21, 2019

cgwalters commented Feb 21, 2019

abhinavdahiya commented Feb 21, 2019

cgwalters commented Feb 21, 2019

wking commented Feb 21, 2019

cgwalters commented Feb 21, 2019

abhinavdahiya Feb 21, 2019

wking Mar 9, 2019

mtrmac left a comment

mtrmac Mar 11, 2019

wking Mar 11, 2019

mtrmac Mar 11, 2019

mtrmac Jul 25, 2019

wking commented Mar 12, 2019

openshift-ci-robot commented Mar 15, 2019

cgwalters commented Mar 26, 2019

cgwalters commented Mar 26, 2019

wking commented Mar 26, 2019

ericavonb commented Mar 27, 2019 via email

wking commented Mar 27, 2019

cgwalters commented Mar 27, 2019

mtrmac Apr 30, 2019 •

edited

Loading

openshift-ci-robot commented May 15, 2019

abhinavdahiya commented Jul 24, 2019

openshift-ci-robot commented Jul 24, 2019

pkg/rhcos/release: Extract RHCOS build from release image #1286

pkg/rhcos/release: Extract RHCOS build from release image #1286

Conversation

wking commented Feb 21, 2019

openshift-ci-robot commented Feb 21, 2019

wking commented Feb 21, 2019

cgwalters commented Feb 21, 2019

abhinavdahiya commented Feb 21, 2019

cgwalters commented Feb 21, 2019

wking commented Feb 21, 2019

cgwalters commented Feb 21, 2019

abhinavdahiya Feb 21, 2019

Choose a reason for hiding this comment

wking Mar 9, 2019

Choose a reason for hiding this comment

mtrmac left a comment

Choose a reason for hiding this comment

mtrmac Mar 11, 2019

Choose a reason for hiding this comment

wking Mar 11, 2019

Choose a reason for hiding this comment

mtrmac Mar 11, 2019

Choose a reason for hiding this comment

mtrmac Jul 25, 2019

Choose a reason for hiding this comment

wking commented Mar 12, 2019

openshift-ci-robot commented Mar 15, 2019

cgwalters commented Mar 26, 2019

cgwalters commented Mar 26, 2019

wking commented Mar 26, 2019

ericavonb commented Mar 27, 2019 via email

wking commented Mar 27, 2019

cgwalters commented Mar 27, 2019

mtrmac Apr 30, 2019 • edited Loading

Choose a reason for hiding this comment

openshift-ci-robot commented May 15, 2019

abhinavdahiya commented Jul 24, 2019

openshift-ci-robot commented Jul 24, 2019

mtrmac Apr 30, 2019 •

edited

Loading