Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cirrus.yml: implement skips based on source changes #23030

Merged
merged 2 commits into from
Jun 20, 2024

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jun 18, 2024

We do not have to test everything for each PR, we can know based on the source if we changed (i.e. machine code) and only run the tests then.

This implements it as skip conditions, due to the nature of yaml files we unfortunately cannot deduplicate everything, i.e. the is PR check and danger files apply to everything but as skip is only a single yaml string we cannot deduplicate parts of that string. If anyone knows a way to achieve this I like to hear it.

For now I implemented this for int, system, bud and machine tests. Once we are more comfortable with this I plan on adding it to other tests as well.

This will replace the current _bail_if_test_can_be_skipped logic as it covers more, marks tasks actually skipped in the github UI and works even for the windows/macos machine tests.

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added release-note-none do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 18, 2024
Copy link
Contributor

openshift-ci bot commented Jun 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2024
@Luap99
Copy link
Member Author

Luap99 commented Jun 18, 2024

@cevich @edsantiago PTAL

The duplication is annoying for sure but I cannot think of something better, open for ideas.

My current idea to test this would be cherry-pick this commit into a new PR, remove .cirrus.yml from the danger files and then add a new commit on top with "real" changes to see that the task runs when we expected it too.

@Luap99
Copy link
Member Author

Luap99 commented Jun 18, 2024

Ok some basic checks, not the full matrix but I think this is good enough:

Looks like it works as I expect and it clearly marks tests as skipped to the PR author/reviewer so it should be obvious what is actually being run.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some concerns

.cirrus.yml Outdated Show resolved Hide resolved
.cirrus.yml Outdated Show resolved Hide resolved
.cirrus.yml Outdated
skip: >-
$CIRRUS_PR != '' &&
!changesInclude('.cirrus.yml', 'Makefile', 'contrib/cirrus/**', 'vendor/**', 'hack/**') &&
!changesInclude('**/*build*.go')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making me more and more uncomfortable as the day progresses. I think this is >95% safe, but I can also envision situations where changes to storage code (podman, not vendor) or CONF envariable handling could impact podman build. Also, bud.bats runs podman commands other than build. I don't want to block, but I'd really like a lot of thought going into this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of much, first of all consider that all podman build tests part of e2e/system tests are still run. This is only the buildah bud specific test that I skip here because it is both flaky and slow.
I honestly don't see much risk here and well that is why we run everything nightly. So at least the next day we should know that something broke an PR the last day (easy enough to check manually, or git bisect if needed).

IMO the benefit of less flakes faster CI outweighs the risk here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: It is missing test/buildah-bud in case someone updates the diff. I will update this on the next push

$CIRRUS_PR != '' &&
!changesInclude('.cirrus.yml', 'Makefile', 'contrib/cirrus/**', 'vendor/**', 'hack/**') &&
!changesInclude('test/system/**') &&
!(changesInclude('**/*.go', '**/*.c') && !changesIncludeOnly('test/**', 'pkg/machine/e2e/**'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my bud comments. Is it possible for changes to something-else.go to break podman-machine tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I cannot say 100% no. Sure there is risk that some day something weird happens or we update the file structure. Nightly runs should catch things and I also count on reviewers to see that machine tests are marked skipped but they changed some machine related code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying more on the nightly cron-runs is probably okay for complex cases like this.

@cevich
Copy link
Member

cevich commented Jun 18, 2024

Non-blocking / nice-to-have: Remove contrib/cirrus/CIModes.md and all the supporting only_if: logic in this PR or as a followup. It's too hard for humans to figure out what the combination of complex only_if and skip means. Let's just have one please 🤓

Otherwise, this is nice work so far @Luap99

Copy link
Member

@ashley-cui ashley-cui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be instances where we don't want to skip anything, ie release PR's, even if certain files are not changed. In this case, it could be done by checking for edits in the rawversion.go, but there might be other instances where running everything would make sense. Maybe consider a override label? I'd be okay with this in a followup PR or something.

@Luap99
Copy link
Member Author

Luap99 commented Jun 18, 2024

Non-blocking / nice-to-have: Remove contrib/cirrus/CIModes.md and all the supporting only_if: logic in this PR or as a followup. It's too hard for humans to figure out what the combination of complex only_if and skip means. Let's just have one please 🤓

Absolutely, I like to remove all of the special magic CI modes. However first I needed to be sure this here even works. Second I like to go incremental on this. Currently we cannot replace CI:DOCS as other tests such as compose or APIv2 are not yet covered by file based rules. So I like to keep them until I migrated all task over.

@Luap99
Copy link
Member Author

Luap99 commented Jun 18, 2024

I think there might be instances where we don't want to skip anything, ie release PR's, even if certain files are not changed. In this case, it could be done by checking for edits in the rawversion.go, but there might be other instances where running everything would make sense. Maybe consider a override label? I'd be okay with this in a followup PR or something.

Yes thanks, I totally forget about the version file. I need to look into what options we have for a manual overwrite. Ideally nobody should need that but I agree that having such option might be useful just in case.

@cevich
Copy link
Member

cevich commented Jun 18, 2024

first I needed to be sure this here even works.

Yes, testing it (and changesInclude() in general) is hard since .cirrus.yml needs to be in the list 😞 I haven't been able to figure out a good way to get around that. The only thing I can think of is to leave that file out initially, do your testing, then stick it back in for final PR review.

There's also containers/automation_sandbox you're welcome to play in. Perhaps exchange all the gce_instance and ec2_instance with containers, and mock out all the scripts to /bin/true. That will save you from needing to mess with re-generating all the secrets and iterate quickly.

@Luap99
Copy link
Member Author

Luap99 commented Jun 19, 2024

Well I already tested it here, see #23030 (comment)

@Luap99
Copy link
Member Author

Luap99 commented Jun 19, 2024

Pushed a new version:

  • Fixed comment style and added test as requested by @edsantiago
  • added missing test/buildah-bud to the bud test file list in case someone only touches the diff files there.
  • added version/rawversion/* to the danger files as pointed out by @ashley-cui , this should make sure all release PRs run all tests. We never want to release something where tests could have told us something is broken.

Still missing is manual overwrite option, the best I can think right now is to use a special title like CI:ALL like what we do with CI:DOCS today, opinions on that?

@cevich
Copy link
Member

cevich commented Jun 20, 2024

opinions on that?

👍

@edsantiago
Copy link
Member

Reading the conditions makes my brain hurt, but that should (fingers crossed) only be a problem now at review time, not in the future. Logicwise, I can't find any cases this won't handle, so I'm close to LGTM. I'd still like to think about it some more.

As for CI:ALL, it's either that or define a trigger file like force-ci. I think CI:ALL is slightly cleaner, and consistent with the CI:XXX habits we've trained ourselves on.

@edsantiago
Copy link
Member

Possible suggestion for the sanity check:

diff --git a/contrib/cirrus/cirrus_yaml_test.py b/contrib/cirrus/cirrus_yaml_test.py
index 49d35d1ff..5ab43ce5d 100755
--- a/contrib/cirrus/cirrus_yaml_test.py
+++ b/contrib/cirrus/cirrus_yaml_test.py
@@ -63,14 +63,17 @@ class TestDependsOn(TestCaseBase):
     def test_skips(self):
         """2024-06 PR#23030: ugly but necessary duplication in skip conditions. Prevent typos or unwanted changes."""
         beginning = "$CIRRUS_PR != '' && !changesInclude('.cirrus.yml', 'Makefile', 'contrib/cirrus/**', 'vendor/**', 'hack/**', 'version/rawversion/*') && "
+        real_source_changes = " && !(changesInclude('**/*.go', '**/*.c') && !changesIncludeOnly('test/**', 'pkg/machine/e2e/**'))"
 
         for task_name in self.ALL_TASK_NAMES:
             task = self.CIRRUS_YAML[task_name + '_task']
             if 'skip' in task:
                 skip = task['skip']
                 if 'changesInclude' in skip:
-                    msg = ('{0}: invalid skip "{1}"'.format(task_name, skip))
-                    self.assertEqual(skip[:len(beginning)], beginning, msg=msg)
+                    msg = ('{0}: invalid skip'.format(task_name))
+                    self.assertEqual(skip[:len(beginning)], beginning, msg=msg+": beginning part is wrong")
+                    if 'changesIncludeOnly' in skip:
+                        self.assertEqual(skip[len(skip)-len(real_source_changes):], real_source_changes, msg=msg+": changesIncludeOnly() part is wrong")
 
     def not_task(self):
         """Ensure no task is named 'task'"""

Reasons: (1) check the tail part, it's too hard to check by hand; and (2) remove clutter from message, it makes the failure message unreadable.

@ashley-cui
Copy link
Member

LGTM, I like CI:ALL

@cevich
Copy link
Member

cevich commented Jun 20, 2024

I'm not against CI:ALL, I think that's fine for now. I'm slightly more attracted to Ed's suggestion of a force-ci file though. Simply for the reason that it allows removing the CIModes.md docs, and it greatly simplifies the brain-work required to reconcile complex only_if and skip attributes. Maybe that could be considered in a follow-up PR?

@Luap99
Copy link
Member Author

Luap99 commented Jun 20, 2024

As for CI:ALL, it's either that or define a trigger file like force-ci

Well I mean technically one can already just add a newline in Makefile/cirrus.yml or some other file from the danger zone so I rather not add another file for it.

I think I go CI:ALL for now and then we can reconsider once I am at the point where we can remove the other special titles such as CI:DOCS.

@cevich
Copy link
Member

cevich commented Jun 20, 2024

I think I go CI:ALL for now

Cool with me. I'm okay too with followup improvements to cirrus_yaml_test.py. No reason to make everyone wait for test improvements, nobody else is working on this skip-logic stuff.

In case it's not been said: Really nice work here Paul 👏 It's really important (to me at least) that the team is well engaged with our own automation systems. You've really "stepped up to the plate" recently and it's both noted and appreciated.

Luap99 added 2 commits June 20, 2024 19:10
We do not have to test everything for each PR, we can know based on the
source if we changed (i.e. machine code) and only run the tests then.

This implements it as skip conditions, due to the nature of yaml files
we unfortunately cannot deduplicate everything, i.e. the is PR check and
danger files apply to everything but as skip is only a single yaml
string we cannot deduplicate parts of that string. If anyone knows a way
to achieve this I like to hear it.

For now I implemented this for int, system, bud and machine tests. Once
we are more comfortable with this I plan on adding it to other tests as
well.

This will replace the current _bail_if_test_can_be_skipped logic as it
covers more, marks tasks actually skipped in the github UI and works
even for the windows/macos machine tests.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Now that we have source based skips there might be a case where we have
to run all tests. One option is to simply change a line in one of the
danger files but having something that can be set as title might be
easier for users.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member Author

Luap99 commented Jun 20, 2024

Pushed the test change from @edsantiago and now another commit to add CI:ALL support. This should be ready.

@Luap99 Luap99 marked this pull request as ready for review June 20, 2024 17:11
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2024
@@ -126,6 +126,10 @@ is removed.
commit-change before Cirrus-CI will notice the draft-status update (i.e.
pressing the re-run button **is not** good enough).

### Intended `[CI:ALL]` behavior:

Run even the tasks that are skipped based on changed sources conditions otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to parse, but IMO not worth a repush. Should you wish to repush (now or in a followup), perhaps something like:

As of June 2024, the default Cirrus CI setup skips tasks that it deems unnecessary, such as running e2e or system tests on a doc-only PR (see #23030). This string forces all CI jobs to run.

@edsantiago
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2024
@cevich
Copy link
Member

cevich commented Jun 20, 2024

LGTM

@openshift-merge-bot openshift-merge-bot bot merged commit 48e1efb into containers:main Jun 20, 2024
89 of 90 checks passed
@Luap99 Luap99 deleted the CI-cond branch June 20, 2024 18:20
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 19, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants