Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Don't update last good commit if CI skipped #716

Merged

Conversation

DeeDeeG
Copy link
Member

@DeeDeeG DeeDeeG commented Sep 13, 2023

Identify the Bug

The skipping logic from #701 is slightly broken.

Cirrus updates CIRRUS_LAST_GREEN_CHANGE for all "successful builds". Unexpectedly, that var is updated even if the build had no tasks, or had tasks that were all SKIPPED.

By the time the cron build rolled around, a branch push to master had already triggered, skipped, and been marked successful for that commit in Cirrus. So, the cron builds skipped as well, seeing that "the commit was already built once, nothing to do!".

So, we were effectively skipping every build ever triggered on the Cirrus, even the cron builds that were meant to, you know, build and publish Rolling binaries every few days (Mon+Wed+Fri).

Description of the Change

Run a task that does nothing other than report a failing status, so the overall build gets marked as failed.

(Conditionally running this task using the proper inverse of the only_if: logic for the other "real" tasks we want to run... So, if we were going to skip everything, run only this task that quickly/quietly fails the build. If we were going to run real tasks, do so us usual and don't fail the builds.)

Makes sure commits that have had CI "run-but-actually-skipped" on them don't report as "Successful", so the skipping logic doesn't prevent/skip all builds ever in Cirrus.

  • Bonus implementation detail: I enabled a feature for this task that avoids reporting the failure to GitHub, so there will be less noise when checking the CI status of the skipped builds on the GitHub side. ✅

Alternate Designs

  • We could just yeet/delete the skipping logic from Cirrus: Skip builds if same commit was previously built #701.
    • That logic is rarely used -- currently only really helps (as designed, if it was working properly) if a Cron build runs twice on the same commit. Basically only if we go two or three days between adding more commits tomaster branch.

Justification of why doing this instead of simplifying: I wanted to "skip repeat builds of the same commit (#701) and do so properly (this PR)", because "engineers gonna engineer" and add more cool stuff... But also, skipping re-builds both theoretically saves us some credits, and theoretically saves us from pushing out new Rolling binaries in situations when nothing actually changed. (Makes it easier to reason about when/why the different Rolling versions were produced.)

Possible Drawbacks

  • Unfortunately, this still shows up as extremely short failed builds on the Cirrus side. A bit noisy. ❌
  • More complex than just yeeting/deleting the skipping logic from Cirrus: Skip builds if same commit was previously built #701.
  • Uses a very, very small number of seconds (2 seconds is typical) per build where this is applicable, on Linux (the cheapest OS to test on Cirrus), with 1 CPU configured. Worst-case scenario is ~12 seconds of 1 CPU of a Linux container. Though I have only ever seen 2 billable seconds, IRL, if I recall correctly. The cumulative impact of these over an entire month, even with heavy repo activity and lots of pushes, should add up to well, well, well under one whole credit per month.

Verification Process

  • If you look at the CI status of this PR on Cirrus, it should show one, ~2-3 second task that did nothing other than immediately fail
  • If you look at the Cirrus status of this PR from the GitHub UI, it should be clean or show no Cirrus info at all. This is because the failure is not reported to GitHub, to reduce the visual noise.
    • Update: This was true at this point as I was going down the checklist. There are genuine aborted tasks that are showing instead now, see below. But all is still as expected and intended.
  • If we schedule a Cron build of this PR's branch, it should actually run!
    • (We might end up with binaries published to the Rolling releases repo if we do that, in which case we should probably track those down and delete them...)
    • It did work. I cancelled the builds. So, now this PR shows with a cancelled (failed) build, but before it had nothing reported from Cirrus, for the record.

Release Notes

Fixed skipping logic for Cirrus (but maybe don't add this in the actual Changelog, IDK, it's a CI-only change, so...???)

For builds that are effectively skipped, since their tasks are all
skipped or not scheduled in the first place, we shouldn't update
CIRRUS_LAST_GREEN_CHANGE.

Unfortunately, Cirrus *does* update that for builds with no or
all-skipped tasks, for now. They may fix it in the future, we have a
feature request open for it. But for now, this is the workaround.
Copy link
Member

@confused-Techie confused-Techie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look awesome!
Glad to look over everything else and love to see that everything is working exactly as expected.

Especially glad you found skip_notifications: true to ensure we don't clutter GitHub PRs with pointless failure notifications (although seeing the two tasks skipped now does make me a little worried, but that's not as bad as them having been marked as failed).

So overall awesome changes, and lets get them merged!

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Sep 14, 2023

Just for context about the aborted jobs shown for this PR, those won't generally be shown.

Details:

For this PR specifically, I was going through my "Verification Process" checklist, and I scheduled some Cron builds to make sure it doesn't still skip for those. They didn't skip. ✅ Just what I wanted to see.

Since those were running, and they were running the real, full-fat ARM Linux and Apple Silicon macOS tasks, they were using credits. So I aborted those. Those are genuine, manually-aborted tasks, so they are meant to show, I suppose.

The actual "does-nothing-but-fail" task this PR adds does appear to be working as intended, and it doesn't report its failures to GitHub. Before I went through my testing checklist, GitHub didn't show any Cirrus checks at all for this PR, even though there was one failed task on the Cirrus side.

(I should have screen-shotted it, dang.)

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Sep 14, 2023

Thanks for the approve!!

With this, we should have some ARM Rolling binaries/an ARM Rolling release out again some time early on Friday. (Early for most people, timezones and all that).

@DeeDeeG DeeDeeG merged commit 80e830b into master Sep 14, 2023
99 of 101 checks passed
@DeeDeeG
Copy link
Member Author

DeeDeeG commented Sep 14, 2023

Going to link to the feature request where I asked Cirrus for an official, non-workaround way to do this, so we don't have to do a silly workaround:

cirruslabs/cirrus-ci-docs#1233

@DeeDeeG DeeDeeG deleted the CI-dont-update-CIRRUS_LAST_GREEN_CHANGE-for-skipped-builds branch September 20, 2023 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants