cirrus.yml: implement skips based on source changes #23030

Luap99 · 2024-06-18T13:38:15Z

We do not have to test everything for each PR, we can know based on the source if we changed (i.e. machine code) and only run the tests then.

This implements it as skip conditions, due to the nature of yaml files we unfortunately cannot deduplicate everything, i.e. the is PR check and danger files apply to everything but as skip is only a single yaml string we cannot deduplicate parts of that string. If anyone knows a way to achieve this I like to hear it.

For now I implemented this for int, system, bud and machine tests. Once we are more comfortable with this I plan on adding it to other tests as well.

This will replace the current _bail_if_test_can_be_skipped logic as it covers more, marks tasks actually skipped in the github UI and works even for the windows/macos machine tests.

Does this PR introduce a user-facing change?

None

openshift-ci · 2024-06-18T13:38:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Luap99 · 2024-06-18T13:42:56Z

@cevich @edsantiago PTAL

The duplication is annoying for sure but I cannot think of something better, open for ideas.

My current idea to test this would be cherry-pick this commit into a new PR, remove .cirrus.yml from the danger files and then add a new commit on top with "real" changes to see that the task runs when we expected it too.

Luap99 · 2024-06-18T15:31:51Z

Ok some basic checks, not the full matrix but I think this is good enough:

Looks like it works as I expect and it clearly marks tests as skipped to the PR author/reviewer so it should be obvious what is actually being run.

edsantiago

LGTM modulo some concerns

.cirrus.yml

edsantiago · 2024-06-18T17:57:19Z

.cirrus.yml

+    skip: >-
+        $CIRRUS_PR != '' &&
+        !changesInclude('.cirrus.yml', 'Makefile', 'contrib/cirrus/**', 'vendor/**', 'hack/**') &&
+        !changesInclude('**/*build*.go')


This is making me more and more uncomfortable as the day progresses. I think this is >95% safe, but I can also envision situations where changes to storage code (podman, not vendor) or CONF envariable handling could impact podman build. Also, bud.bats runs podman commands other than build. I don't want to block, but I'd really like a lot of thought going into this.

I cannot think of much, first of all consider that all podman build tests part of e2e/system tests are still run. This is only the buildah bud specific test that I skip here because it is both flaky and slow.
I honestly don't see much risk here and well that is why we run everything nightly. So at least the next day we should know that something broke an PR the last day (easy enough to check manually, or git bisect if needed).

IMO the benefit of less flakes faster CI outweighs the risk here.

Note to myself: It is missing test/buildah-bud in case someone updates the diff. I will update this on the next push

edsantiago · 2024-06-18T17:57:39Z

.cirrus.yml

+        $CIRRUS_PR != '' &&
+        !changesInclude('.cirrus.yml', 'Makefile', 'contrib/cirrus/**', 'vendor/**', 'hack/**') &&
+        !changesInclude('test/system/**') &&
+        !(changesInclude('**/*.go', '**/*.c') && !changesIncludeOnly('test/**', 'pkg/machine/e2e/**'))


Similar to my bud comments. Is it possible for changes to something-else.go to break podman-machine tests?

Well I cannot say 100% no. Sure there is risk that some day something weird happens or we update the file structure. Nightly runs should catch things and I also count on reviewers to see that machine tests are marked skipped but they changed some machine related code.

I think relying more on the nightly cron-runs is probably okay for complex cases like this.

cevich · 2024-06-18T18:35:35Z

Non-blocking / nice-to-have: Remove contrib/cirrus/CIModes.md and all the supporting only_if: logic in this PR or as a followup. It's too hard for humans to figure out what the combination of complex only_if and skip means. Let's just have one please 🤓

Otherwise, this is nice work so far @Luap99

ashley-cui

I think there might be instances where we don't want to skip anything, ie release PR's, even if certain files are not changed. In this case, it could be done by checking for edits in the rawversion.go, but there might be other instances where running everything would make sense. Maybe consider a override label? I'd be okay with this in a followup PR or something.

Luap99 · 2024-06-18T18:47:17Z

Non-blocking / nice-to-have: Remove contrib/cirrus/CIModes.md and all the supporting only_if: logic in this PR or as a followup. It's too hard for humans to figure out what the combination of complex only_if and skip means. Let's just have one please 🤓

Absolutely, I like to remove all of the special magic CI modes. However first I needed to be sure this here even works. Second I like to go incremental on this. Currently we cannot replace CI:DOCS as other tests such as compose or APIv2 are not yet covered by file based rules. So I like to keep them until I migrated all task over.

Luap99 · 2024-06-18T18:50:58Z

I think there might be instances where we don't want to skip anything, ie release PR's, even if certain files are not changed. In this case, it could be done by checking for edits in the rawversion.go, but there might be other instances where running everything would make sense. Maybe consider a override label? I'd be okay with this in a followup PR or something.

Yes thanks, I totally forget about the version file. I need to look into what options we have for a manual overwrite. Ideally nobody should need that but I agree that having such option might be useful just in case.

cevich · 2024-06-18T19:15:51Z

first I needed to be sure this here even works.

Yes, testing it (and changesInclude() in general) is hard since .cirrus.yml needs to be in the list 😞 I haven't been able to figure out a good way to get around that. The only thing I can think of is to leave that file out initially, do your testing, then stick it back in for final PR review.

There's also containers/automation_sandbox you're welcome to play in. Perhaps exchange all the gce_instance and ec2_instance with containers, and mock out all the scripts to /bin/true. That will save you from needing to mess with re-generating all the secrets and iterate quickly.

Luap99 · 2024-06-19T07:36:47Z

Well I already tested it here, see #23030 (comment)

Luap99 · 2024-06-19T12:03:51Z

Pushed a new version:

Fixed comment style and added test as requested by @edsantiago
added missing test/buildah-bud to the bud test file list in case someone only touches the diff files there.
added version/rawversion/* to the danger files as pointed out by @ashley-cui , this should make sure all release PRs run all tests. We never want to release something where tests could have told us something is broken.

Still missing is manual overwrite option, the best I can think right now is to use a special title like CI:ALL like what we do with CI:DOCS today, opinions on that?

cevich · 2024-06-20T13:18:58Z

opinions on that?

👍

edsantiago · 2024-06-20T13:42:42Z

Reading the conditions makes my brain hurt, but that should (fingers crossed) only be a problem now at review time, not in the future. Logicwise, I can't find any cases this won't handle, so I'm close to LGTM. I'd still like to think about it some more.

As for CI:ALL, it's either that or define a trigger file like force-ci. I think CI:ALL is slightly cleaner, and consistent with the CI:XXX habits we've trained ourselves on.

edsantiago · 2024-06-20T13:47:40Z

Possible suggestion for the sanity check:

diff --git a/contrib/cirrus/cirrus_yaml_test.py b/contrib/cirrus/cirrus_yaml_test.py
index 49d35d1ff..5ab43ce5d 100755
--- a/contrib/cirrus/cirrus_yaml_test.py
+++ b/contrib/cirrus/cirrus_yaml_test.py
@@ -63,14 +63,17 @@ class TestDependsOn(TestCaseBase):
     def test_skips(self):
         """2024-06 PR#23030: ugly but necessary duplication in skip conditions. Prevent typos or unwanted changes."""
         beginning = "$CIRRUS_PR != '' && !changesInclude('.cirrus.yml', 'Makefile', 'contrib/cirrus/**', 'vendor/**', 'hack/**', 'version/rawversion/*') && "
+        real_source_changes = " && !(changesInclude('**/*.go', '**/*.c') && !changesIncludeOnly('test/**', 'pkg/machine/e2e/**'))"
 
         for task_name in self.ALL_TASK_NAMES:
             task = self.CIRRUS_YAML[task_name + '_task']
             if 'skip' in task:
                 skip = task['skip']
                 if 'changesInclude' in skip:
-                    msg = ('{0}: invalid skip "{1}"'.format(task_name, skip))
-                    self.assertEqual(skip[:len(beginning)], beginning, msg=msg)
+                    msg = ('{0}: invalid skip'.format(task_name))
+                    self.assertEqual(skip[:len(beginning)], beginning, msg=msg+": beginning part is wrong")
+                    if 'changesIncludeOnly' in skip:
+                        self.assertEqual(skip[len(skip)-len(real_source_changes):], real_source_changes, msg=msg+": changesIncludeOnly() part is wrong")
 
     def not_task(self):
         """Ensure no task is named 'task'"""

Reasons: (1) check the tail part, it's too hard to check by hand; and (2) remove clutter from message, it makes the failure message unreadable.

ashley-cui · 2024-06-20T14:09:31Z

LGTM, I like CI:ALL

cevich · 2024-06-20T14:14:34Z

I'm not against CI:ALL, I think that's fine for now. I'm slightly more attracted to Ed's suggestion of a force-ci file though. Simply for the reason that it allows removing the CIModes.md docs, and it greatly simplifies the brain-work required to reconcile complex only_if and skip attributes. Maybe that could be considered in a follow-up PR?

Luap99 · 2024-06-20T14:22:49Z

As for CI:ALL, it's either that or define a trigger file like force-ci

Well I mean technically one can already just add a newline in Makefile/cirrus.yml or some other file from the danger zone so I rather not add another file for it.

I think I go CI:ALL for now and then we can reconsider once I am at the point where we can remove the other special titles such as CI:DOCS.

cevich · 2024-06-20T14:45:03Z

I think I go CI:ALL for now

Cool with me. I'm okay too with followup improvements to cirrus_yaml_test.py. No reason to make everyone wait for test improvements, nobody else is working on this skip-logic stuff.

In case it's not been said: Really nice work here Paul 👏 It's really important (to me at least) that the team is well engaged with our own automation systems. You've really "stepped up to the plate" recently and it's both noted and appreciated.

We do not have to test everything for each PR, we can know based on the source if we changed (i.e. machine code) and only run the tests then. This implements it as skip conditions, due to the nature of yaml files we unfortunately cannot deduplicate everything, i.e. the is PR check and danger files apply to everything but as skip is only a single yaml string we cannot deduplicate parts of that string. If anyone knows a way to achieve this I like to hear it. For now I implemented this for int, system, bud and machine tests. Once we are more comfortable with this I plan on adding it to other tests as well. This will replace the current _bail_if_test_can_be_skipped logic as it covers more, marks tasks actually skipped in the github UI and works even for the windows/macos machine tests. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Now that we have source based skips there might be a case where we have to run all tests. One option is to simply change a line in one of the danger files but having something that can be set as title might be easier for users. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 · 2024-06-20T17:11:49Z

Pushed the test change from @edsantiago and now another commit to add CI:ALL support. This should be ready.

edsantiago · 2024-06-20T17:24:39Z

contrib/cirrus/CIModes.md

@@ -126,6 +126,10 @@ is removed.
 commit-change before Cirrus-CI will notice the draft-status update (i.e.
 pressing the re-run button **is not** good enough).

+### Intended `[CI:ALL]` behavior:
+
+Run even the tasks that are skipped based on changed sources conditions otherwise.


Hard to parse, but IMO not worth a repush. Should you wish to repush (now or in a followup), perhaps something like:

As of June 2024, the default Cirrus CI setup skips tasks that it deems unnecessary, such as running e2e or system tests on a doc-only PR (see #23030). This string forces all CI jobs to run.

edsantiago · 2024-06-20T17:25:12Z

/lgtm

cevich · 2024-06-20T17:51:20Z

LGTM

openshift-ci bot added release-note-none do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 18, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2024

This was referenced Jun 18, 2024

DNM: testing only (docs) #23033

Closed

DNM: testing only (regular code) #23034

Closed

DNM: testing only (machine) #23035

Closed

DNM: testing only (system test) #23036

Closed

edsantiago reviewed Jun 18, 2024

View reviewed changes

ashley-cui reviewed Jun 18, 2024

View reviewed changes

Luap99 force-pushed the CI-cond branch from c667d25 to 832a50f Compare June 19, 2024 11:58

Luap99 added 2 commits June 20, 2024 19:10

Luap99 force-pushed the CI-cond branch from 832a50f to f134ab7 Compare June 20, 2024 17:10

Luap99 marked this pull request as ready for review June 20, 2024 17:11

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2024

edsantiago reviewed Jun 20, 2024

View reviewed changes

openshift-ci bot assigned edsantiago Jun 20, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2024

openshift-merge-bot bot merged commit 48e1efb into containers:main Jun 20, 2024
89 of 90 checks passed

Luap99 deleted the CI-cond branch June 20, 2024 18:20

edsantiago mentioned this pull request Jun 25, 2024

machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

Closed

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 19, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cirrus.yml: implement skips based on source changes #23030

cirrus.yml: implement skips based on source changes #23030

Luap99 commented Jun 18, 2024

openshift-ci bot commented Jun 18, 2024

Luap99 commented Jun 18, 2024

Luap99 commented Jun 18, 2024

edsantiago left a comment

edsantiago Jun 18, 2024

Luap99 Jun 18, 2024

Luap99 Jun 18, 2024

edsantiago Jun 18, 2024

Luap99 Jun 18, 2024

cevich Jun 18, 2024

cevich commented Jun 18, 2024

ashley-cui left a comment

Luap99 commented Jun 18, 2024

Luap99 commented Jun 18, 2024

cevich commented Jun 18, 2024

Luap99 commented Jun 19, 2024

Luap99 commented Jun 19, 2024

cevich commented Jun 20, 2024

edsantiago commented Jun 20, 2024

edsantiago commented Jun 20, 2024

ashley-cui commented Jun 20, 2024

cevich commented Jun 20, 2024

Luap99 commented Jun 20, 2024

cevich commented Jun 20, 2024

Luap99 commented Jun 20, 2024

edsantiago Jun 20, 2024

edsantiago commented Jun 20, 2024

cevich commented Jun 20, 2024

cirrus.yml: implement skips based on source changes #23030

cirrus.yml: implement skips based on source changes #23030

Conversation

Luap99 commented Jun 18, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented Jun 18, 2024

Luap99 commented Jun 18, 2024

Luap99 commented Jun 18, 2024

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago Jun 18, 2024

Choose a reason for hiding this comment

Luap99 Jun 18, 2024

Choose a reason for hiding this comment

Luap99 Jun 18, 2024

Choose a reason for hiding this comment

edsantiago Jun 18, 2024

Choose a reason for hiding this comment

Luap99 Jun 18, 2024

Choose a reason for hiding this comment

cevich Jun 18, 2024

Choose a reason for hiding this comment

cevich commented Jun 18, 2024

ashley-cui left a comment

Choose a reason for hiding this comment

Luap99 commented Jun 18, 2024

Luap99 commented Jun 18, 2024

cevich commented Jun 18, 2024

Luap99 commented Jun 19, 2024

Luap99 commented Jun 19, 2024

cevich commented Jun 20, 2024

edsantiago commented Jun 20, 2024

edsantiago commented Jun 20, 2024

ashley-cui commented Jun 20, 2024

cevich commented Jun 20, 2024

Luap99 commented Jun 20, 2024

cevich commented Jun 20, 2024

Luap99 commented Jun 20, 2024

edsantiago Jun 20, 2024

Choose a reason for hiding this comment

edsantiago commented Jun 20, 2024

cevich commented Jun 20, 2024