Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel GH actions workflow for Nixpkgs eval #356023

Merged
merged 2 commits into from
Nov 20, 2024
Merged

Conversation

infinisil
Copy link
Member

@infinisil infinisil commented Nov 14, 2024

This PR primarily introduces a new GitHub Action workflow to effectively do a full Hydra evaluation for every PR, paving the way towards relieving ofborg of its evaluation duty. This is motivated by https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025, see also #355847.

To make this as efficient as possible, 4 workflow jobs are spawned in parallel (one for each supported system, as originally experimented by @Mic92 in #352808), while each job makes full use of its 4 cores and 16GB of memory (based off amjoseph's idea in #269403 and initial work in #269356).

I've also implemented it in a way that limits maximum memory usage as Nixpkgs grows. CI will only take longer to finish, instead of OOMing. Currently it takes about 5 minutes, check out this test run.

Note that ofborg does more than just evaluate Nixpkgs, but this PR can be used as a base to also implement output path comparisons to assign labels based on changed paths, to request appropriate maintainers, and do evaluation time comparisons.

Ping @philiptaron @Mic92 @Erethon @balsoft @JohnRTitor @a-kenji

Things done


This work is funded by Tweag and Antithesis

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: policy discussion 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions labels Nov 14, 2024
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Nov 15, 2024
.github/workflows/eval.yml Outdated Show resolved Hide resolved
Comment on lines 43 to 49
- name: Enable swap
if: env.mergedSha
run: |
sudo fallocate -l 10G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is unused, we can just drop it for now. Especially given that we have another swap already enabled that can act as a canary:

Suggested change
- name: Enable swap
if: env.mergedSha
run: |
sudo fallocate -l 10G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If allocating swap is not slowing down the gh action, I would recommend keeping it as a backup. You never know what happens.

Copy link
Member Author

@infinisil infinisil Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it such that we have a constant chunk size, so that we effectively have an upper limit to memory usage. Then we definitely don't need swap :)

Copy link
Member

@Mic92 Mic92 Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohnRTitor there is already a swap file. So we would see if it is starting to swap before it is a problem and we can add this back as needed.

@Mic92
Copy link
Member

Mic92 commented Nov 16, 2024

I added several improvements to this script: tweag#100 notably more useful json output.

@Mic92
Copy link
Member

Mic92 commented Nov 16, 2024

I like this approach already. I want to incorporate this code in one form or the other in nixpkgs-review as it will make evaluating on normal laptops possible again.

@JohnRTitor
Copy link
Contributor

Please do! Currently evaluating uses too much memory and time!

@Mic92
Copy link
Member

Mic92 commented Nov 16, 2024

@GaetanLepage wants to look into integrating it into nixpkgs-review.

@Mic92
Copy link
Member

Mic92 commented Nov 17, 2024

@infinisil I would suggest to just address all comments for this pull request for now and maybe merge my pull request but not to extend the scope for the first pull request too much.
It would be good to first test to see how this action scales to the whole org before we proceed with more features.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025/37

ci/parallel.nix Outdated Show resolved Hide resolved
@JohnRTitor
Copy link
Contributor

JohnRTitor commented Nov 17, 2024

It would be good to first test to see how this action scales to the whole org before we proceed with more features.

Agreed, due to slowness of Ofborg recently a few commiters are merging PRs without waiting for a successful eval which can take up to 13-15 hours. (I am guilty of the act myself)

This fast but dangerous approach of review makes nixpkgs prone to a tree with broken eval.

Having this basic CI check would help to detect issues beforehand, as we plan to phase out Ofborg in less than two months.

@github-actions github-actions bot added 6.topic: haskell 6.topic: lib The Nixpkgs function library labels Nov 18, 2024
@infinisil infinisil force-pushed the gha-eval branch 8 times, most recently from 82e4957 to 1348471 Compare November 18, 2024 06:01
@infinisil infinisil force-pushed the gha-eval branch 2 times, most recently from 7c36b3d to 9360b35 Compare November 20, 2024 06:20
@infinisil infinisil marked this pull request as ready for review November 20, 2024 06:20
@infinisil
Copy link
Member Author

infinisil commented Nov 20, 2024

This is done now imo! I added some docs and some comments, I also updated the PR description. Pinging @Mic92 @Erethon @balsoft @JohnRTitor @a-kenji @azuwis @TheGiraffe3 for review and approval

ci/eval/README.md Outdated Show resolved Hide resolved
ci/eval/README.md Outdated Show resolved Hide resolved
ci/supportedSystems.nix Outdated Show resolved Hide resolved
pkgs/top-level/release-lib.nix Show resolved Hide resolved
Motivated by ofborg struggling [1] and its evaluations taking too long,
inspired by Jörg's initial PR [2]
and Adam's previous attempt to parallelise Nixpkgs evaluation [3],
this PR contains initial work to relief ofborg from its evaluation duty
by using GitHub Actions to evaluate Nixpkgs.

For now this doesn't take care of all of what ofborg does, such as
requesting appropriate reviewers or labeling mass rebuilds, but this can
be follow-up work.

[1]: https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025?u=infinisil
[2]: NixOS#352808
[3]: NixOS#269403

Co-Authored-By: Jörg Thalheim <joerg@thalheim.io>
Co-Authored-By: Adam Joseph <adam@westernsemico.com>
@Mic92 Mic92 merged commit 6d2d99e into NixOS:master Nov 20, 2024
11 of 12 checks passed
@JohnRTitor JohnRTitor added the backport release-24.11 Backport PR automatically label Nov 20, 2024
Copy link
Contributor

Successfully created backport PR for release-24.11:

@infinisil
Copy link
Member Author

Follow-up fix: #357965

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/equinix-metal-cta-infra-short-update/56538/1

@paparodeo
Copy link
Contributor

I'm seeing ofborg eval failing on staging https://gist.github.com/GrahamcOfBorg/f50183ae86ce9f339b6b41cc33e14619 but the parallel evals are passing: #359106

@FliegendeWurst
Copy link
Member

staging will be fixed very soon. In this particular case the GH actions ran first, then staging broke, and finally ofborg got around to the PR.

@infinisil
Copy link
Member Author

WIP follow-up towards adding rebuild labels: #359704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants