kola: add `run-upgrade` command #1168

jlebon · 2020-01-22T17:36:58Z

This adds a new run-upgrade command focused on running upgrade tests.
It also adds a single test in that testsuite: fcos.upgrade.basic.

To run this test, one can do:

kola run-upgrade -v \
        --cosa-build /path/to/meta.json \
        --qemu-image /path/to/starting-image.qcow2

You can tell kola to automatically detect the parent image to start
from:

kola run-upgrade -v \
        --cosa-build /path/to/meta.json \
        --find-parent-image

For FCOS, this will fetch the metadata for the latest release for the
target stream. On AWS, it will use the AMI from there as the starting
image. On the QEMU platform, it will download the QEMU image locally
(with signature verification). The code is extensible to add support for
RHCOS and other target platforms.

Why make it a separate command from run? Multiple reasons:

As shown above, it's about multiple artifacts, not just the system
under test. By contrast, run is largely about using a single
artifact input. For example, on AWS, --aws-ami points to the
starting image, and --cosa-build points to the target upgrade.
It's more expensive than other tests. To make it truly cross-platform
and self-contained, it works by pushing the OSTree content to the
node and serving it from there to itself. Therefore, it's not a test
that developers would necessarily be interested in running locally
very often (though it's definitely adapted for local tests too when
needed).
Unlike run, it has some special semantics like
--find-parent-image to make it easier to use.

Now, this is only part of the FCOS upgrade testing story. Here's roughly
how I see this all fit together:

The FCOS pipeline runs kola run-upgrade -p qemu and possibly
kola run-upgrade -p aws after the basic kola run tests have
passed.
Once the build is clean and pushed out to S3, its content will be
imported into the annex/compose repo.
Once there, we can do more realistic tests by targeting the annex
repo and a dedicated Cincinnati. For example, we can have canary
nodes following those updates that started from various previous
releases to catch any state-dependent issues. Another more explicit
approach is a test that starts those nodes at the select releases and
gate new releases on that test.

Essentially, the main advantage of this test is that we can do some
upgrade testing before pushing out any bits at all to S3. The major
bug category this is intended to catch are state-dependent ones (i.e.
anything that isn't captured by the OSTree commit).

However, it does also exercise many of the major parts of the update
system (zincati, rpm-ostree, ostree, libcurl). Though it's clearly not
a replacement for more realistic e2e tests downstream.

jlebon · 2020-01-22T17:37:20Z

Marking as WIP for now. I have to split out lots of commits into separate prep PRs first.

cgwalters · 2020-01-22T18:55:06Z

Only glanced at this but I think we want --cosa-build to still define the start, and --cosa-upgrade-target to take the end or so.

jlebon · 2020-01-22T19:11:21Z

Yeah, I'm open to tweaking the current interface. My reasoning for making --cosa-build the target build is that I consider the build we update to to be more the actual artifact under test. The previous build normally is one that has already been released in the wild.

So, to flip this around: --cosa-build and --cosa-parent-build? IOW, --cosa-build is constant, --cosa-parent-build is the thing that could be changed (in fact, we could make it take multiple parent builds and execute those in parallel). Note also that --cosa-build is mandatory, whereas --cosa-parent-build isn't (there's a --find-parent-image which can automatically figure out the parent to use).

jlebon · 2020-01-22T19:17:27Z

Another way to look at this is, --cosa-build is always about the test artifact we actually care about, regardless if it's kola run or kola run-upgrade.

cgwalters · 2020-01-22T19:36:18Z

Another way to look at this is, --cosa-build is always about the test artifact we actually care about, regardless if it's kola run or kola run-upgrade.

OK, that's convincing yes.

jlebon · 2020-01-22T21:44:14Z

OK, split out prep patches in #1170!

jlebon · 2020-01-23T21:44:41Z

OK, this now works on top of #1170! I added streaming decompression and signature verification to make --find-parent-image for the qemu platform faster.

There's definitely a lot more we could do on top of this, though I think it's good enough for now to get in as is, so we can at least start using it in the pipeline.

Some follow-up improvements:

add --cosa-previous-build
add support for more platforms (though really, AWS is the only cloud we can test right now for FCOS until we start uploading elsewhere)
add support for RHCOS

And while we're there, rename the functions to be more descriptive. This is prep for doing streaming decompression and GPG verification of downloaded qemu images.

That way a caller that wants to use the streaming interface can also just pass `""` as the key file to get the default keyring.

This function will download a compressed file and decompress it and verify the signature in a streaming fashion. What we lose in return is the ability to resume file downloads if we're interrupted. I think that trade-off is worth it though for a faster and more efficient common case.

This adds a new `run-upgrade` command focused on running upgrade tests. It also adds a single test in that testsuite: `fcos.upgrade.basic`. To run this test, one can do: ``` kola run-upgrade -v \ --cosa-build /path/to/meta.json \ --qemu-image /path/to/starting-image.qcow2 ``` You can tell kola to automatically detect the parent image to start from: ``` kola run-upgrade -v \ --cosa-build /path/to/meta.json \ --find-parent-image ``` For FCOS, this will fetch the metadata for the latest release for the target stream. On AWS, it will use the AMI from there as the starting image. On the QEMU platform, it will download the QEMU image locally (with signature verification). The code is extensible to add support for RHCOS and other target platforms. Why make it a separate command from `run`? Multiple reasons: 1. As shown above, it's about multiple artifacts, not just the system under test. By contrast, `run` is largely about using a single artifact input. For example, on AWS, `--aws-ami` points to the *starting* image, and `--cosa-build` points to the target upgrade. 2. It's more expensive than other tests. To make it truly cross-platform and self-contained, it works by pushing the OSTree content to the node and serving it from there to itself. Therefore, it's not a test that developers would necessarily be interested in running locally very often (though it's definitely adapted for local tests too when needed). 3. Unlike `run`, it has some special semantics like `--find-parent-image` to make it easier to use. Now, this is only part of the FCOS upgrade testing story. Here's roughly how I see this all fit together: 1. The FCOS pipeline runs `kola run-upgrade -p qemu` and possibly `kola run-upgrade -p aws` after the basic `kola run` tests have passed. 2. Once the build is clean and pushed out to S3, its content will be imported into the annex/compose repo. 3. Once there, we can do more realistic tests by targeting the annex repo and a dedicated Cincinnati. For example, we can have canary nodes following those updates that started from various previous releases to catch any state-dependent issues. Another more explicit approach is a test that starts those nodes at the select releases and gate new releases on that test. Essentially, the main advantage of this test is that we can do some upgrade testing *before* pushing out any bits at all to S3. The major bug category this is intended to catch are state-dependent ones (i.e. anything that *isn't* captured by the OSTree commit). However, it does also exercise many of the major parts of the update system (zincati, rpm-ostree, ostree, libcurl). Though it's clearly not a replacement for more realistic e2e tests downstream.

jlebon · 2020-01-24T14:22:07Z

Rebased!

Start running the new upgrade test right after building the QEMU image. In the AWS test job, run the upgrade test on AWS in parallel. For more information, see: coreos/mantle#1168

cgwalters · 2020-01-27T11:12:19Z

cmd/kola/kola.go

+		}
+		kola.QEMUOptions.DiskImage = decompressedQcowLocal
+	case "aws":
+		kola.AWSOptions.AMI, err = parentCosaBuild.FindAMI(kola.AWSOptions.Region)


See also openshift/installer#2906 - we'll likely at some point need to copy the code from the installer to make images from storage; which gets into terraform vs something else here, or forking out to openshift-install instantiate-coreos-image or someting.

cgwalters

Looks great overall! Very nice code.
Would be nice probably to support the no-zincati case for RHCOS...or OTOH we could just add zincati to RHCOS and leave it disabled by default.

jlebon · 2020-01-27T18:45:42Z

Thanks for the review!

Would be nice probably to support the no-zincati case for RHCOS...or OTOH we could just add zincati to RHCOS and leave it disabled by default.

Yeah, I left space for this enhancement in the code. I guess the closer equivalent would be to just upload the oscontainer into the image storage, then write it to /etc/pivot/image-pullspec and run systemctl start machine-config-daemon-host.service the way the MCD does it? That way we at least test parts of the same MCD/podman/rpm-ostree path as in a cluster. (But yeah, again the emphasis here is on disk-dependent hysteresis -- it's just a little extra points if we get to exercise the same mechanisms as the real thing as a sanity-check for later more realistic downstream e2e tests).

Gonna merge this one now! I'd like to get this and coreos/fedora-coreos-pipeline#190 in before the next stable release (which should be this week).

Start running the new upgrade test right after building the QEMU image. In the AWS test job, run the upgrade test on AWS in parallel. For more information, see: coreos/mantle#1168

jlebon mentioned this pull request Jan 22, 2020

Upgrade Testing to a future release coreos/fedora-coreos-tracker#228

Closed

jlebon force-pushed the pr/fcos-upgrade branch from 014b95a to 36b6694 Compare January 22, 2020 21:43

jlebon force-pushed the pr/fcos-upgrade branch 2 times, most recently from e9816e3 to 0d23362 Compare January 23, 2020 20:09

jlebon changed the title ~~WIP: kola: add run-upgrade command~~ kola: add run-upgrade command Jan 23, 2020

jlebon marked this pull request as ready for review January 23, 2020 20:09

jlebon force-pushed the pr/fcos-upgrade branch 2 times, most recently from c8f5d83 to a5bf6d8 Compare January 23, 2020 21:43

jlebon force-pushed the pr/fcos-upgrade branch from a5bf6d8 to 4e7ed9f Compare January 23, 2020 22:08

jlebon added 4 commits January 24, 2020 09:21

util/xz: add function for pure stream decompression

19b895f

And while we're there, rename the functions to be more descriptive. This is prep for doing streaming decompression and GPG verification of downloaded qemu images.

sdk/verify: move default GPG keyring logic to Verify

bac60c5

That way a caller that wants to use the streaming interface can also just pass `""` as the key file to get the default keyring.

jlebon force-pushed the pr/fcos-upgrade branch from 4e7ed9f to 0cf89ca Compare January 24, 2020 14:21

jlebon mentioned this pull request Jan 24, 2020

Hook up upgrade testing coreos/fedora-coreos-pipeline#190

Merged

cgwalters reviewed Jan 27, 2020

View reviewed changes

cgwalters approved these changes Jan 27, 2020

View reviewed changes

jlebon merged commit e7b3cc6 into coreos:master Jan 27, 2020

jlebon mentioned this pull request Jan 27, 2020

build(deps): bump mantle from b9ad41e to e7b3cc6 coreos/coreos-assembler#1076

Merged

cgwalters mentioned this pull request Jan 28, 2020

Change RHCOS booting for AWS to work like GCP/Azure openshift/installer#2906

Closed

jlebon mentioned this pull request Feb 27, 2020

Add a basic RHCOS upgrade test for kola run-upgrade coreos/coreos-assembler#1158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kola: add `run-upgrade` command #1168

kola: add `run-upgrade` command #1168

jlebon commented Jan 22, 2020

jlebon commented Jan 22, 2020

cgwalters commented Jan 22, 2020

jlebon commented Jan 22, 2020

jlebon commented Jan 22, 2020

cgwalters commented Jan 22, 2020

jlebon commented Jan 22, 2020

jlebon commented Jan 23, 2020

jlebon commented Jan 24, 2020

cgwalters Jan 27, 2020

cgwalters left a comment

jlebon commented Jan 27, 2020

kola: add run-upgrade command #1168

kola: add run-upgrade command #1168

Conversation

jlebon commented Jan 22, 2020

jlebon commented Jan 22, 2020

cgwalters commented Jan 22, 2020

jlebon commented Jan 22, 2020

jlebon commented Jan 22, 2020

cgwalters commented Jan 22, 2020

jlebon commented Jan 22, 2020

jlebon commented Jan 23, 2020

jlebon commented Jan 24, 2020

cgwalters Jan 27, 2020

Choose a reason for hiding this comment

cgwalters left a comment

Choose a reason for hiding this comment

jlebon commented Jan 27, 2020

kola: add `run-upgrade` command #1168

kola: add `run-upgrade` command #1168