[RFC 0056] CI all Nix PRs #56

Ericson2314 · 2019-10-31T01:52:46Z

Build all Nix PRs in CI. Do not merge any PR until it passes CI.

This has been discussed for Nixpkgs, but it should be more economically feasible for just Nix.

Ericson2314 · 2019-10-31T01:58:52Z

Exhibits:

Switch to nixpkgs 19.09 nix#3165 I have no idea if this works on all platforms. Those mentioned failing tests are from a different branch. We shouldn't need to extrapolate from different code being tested.
Port Nix to Meson nix#3160 I happened to test Darwin by hand and notice the failures. If I hadn't, would the original author had known? Would have anyone else?

sondr3 · 2019-10-31T11:55:16Z

So something like the Bors bot used in a lot of the Rust projects? I'd also love to see something like @ofborg merge-if-successful where after fixing feedback the PR would be merged if everything builds fine, right now a lot of PRs also languish after the reviewers gives thumbs up but then never merge the PR. There are a couple of issues already about this in ofborg, e.g. NixOS/ofborg#104 and NixOS/ofborg#112.

edolstra · 2019-10-31T12:00:28Z

We can't use Hydra for this for security reasons. Maybe ofborg could do it. Or we could run a separate Hydra instance that doesn't have write access to cache.nixos.org.

@sondr3 We definitely shouldn't auto-merge PRs just because they build correctly. Just because something builds doesn't mean it's a good idea.

sondr3 · 2019-10-31T12:04:08Z

@edolstra not all PRs no, but a lot of the contributed PRs are simply minor or patch version bumps and most of them should be fine to auto merge after building in my opinion. As mentioned, this has already been brought up in ofborg, and is something I think would be useful (though probably out of scope for this RFC).

Ericson2314 · 2019-10-31T12:29:41Z

@sondr3 Remember this about Nix not Nixpkgs. Baby steps :)

@edolstra What security issues are you worried about? It's not clear to me.

edolstra · 2019-10-31T12:33:20Z

@Ericson2314 Anybody could submit a PR that DoSes the Hydra evaluator or queue, or exploits a zero-day kernel bug to root the build machines.

Ericson2314 · 2019-10-31T12:39:24Z

Aren't those already problems with hydra building arbitrary stuff in Nixpkgs? I see no security difference between submitting arbitrary C++ and arbitrary Nix that can fetch external C++.

Anybody could submit a PR that DoSes the Hydra evaluator or queue

We have timeouts and other resource limits already, right?

Exploits a zero-day kernel bug to root the build machines.

Say a package in Nixpkgs has such a bug, doesn't use it during the ofborg build except to get a side-channel to tell whether it's the cache.nixos.org build, and poisons the results of the hydra.nixos.org build.

Am I missing something?

edolstra · 2019-10-31T12:52:10Z

Hydra doesn't build arbitrary stuff, it builds stuff that has been merged so presumably a human has looked at it. It's ofborg that's exposed, but it doesn't have write access to cache.nixos.org so it doesn't really matter.

We have timeouts and other resource limits already, right?

There are some limits, but there is nothing preventing jobset evaluation consuming all memory (we already do a good job of that ourselves unintentionally...), or a jobset creating a billion builds in the database, or a lots of jobs that fork-bomb every build machine...

Ericson2314 · 2019-10-31T15:38:54Z

Hydra doesn't build arbitrary stuff, it builds stuff that has been merged so presumably a human has looked at it.

I guarantee our reviewers will not reliably catch zero-days, full stop.

There are some limits, but...

Those DDOs problems are all fixable with not much effort. I wouldn't characterize them as a security issues because they do not compromise the store, and the availability of hydra is already....not great.

Given that those are low-impact problem, I would adopt this RFC as is, make some effort to limit resources better, and only if/when spam becomes a problem disable this policy until the issues are fixed.

edolstra · 2019-10-31T15:51:22Z

and the availability of hydra is already....not great

If the availability is not great, that means it's not appropriate as a PR building tool (where you want quick feedback).

Ericson2314 · 2019-10-31T16:25:58Z

I mean that hydra goes down a fair amount; when it's up it works better for building PRs with Nix than travis, or circle-ci, etc, for Nix. Hercules or something would do better than Hydra, but that's a separate conversation.

Basically, I'm saying that unless there is a serious security problem or something, we should CI the we want to regardless of hydra's fragility, rather than shifting the goalposts to avoid doing something about Hydra being fragile.

Mic92 · 2019-10-31T16:27:11Z

Well, if it is just Nix almost every CI would do. Hydra, Github Actions, Travis CI. A simple travis.yml would be shorter than this RFC.

Ericson2314 · 2019-10-31T16:30:24Z

@Mic92 Those are often timeout hell. Also, the YAML/whatever boilerplate for those is a huge maintenance burden because it is impossible to reproduce locally. Finally, I do want cached builds of PRs, not merely a CI green light. I think that's a great feature.

Mic92 · 2019-10-31T16:45:26Z

@Ericson2314 This was the case for nixpkgs, but do you think they would timeout for nix as well? In nix's release.nix the nixpkgs 19.09 channel is used, so all dependencies should be build already unlike nixpkgs master. I extended the travis.yml to also build on linux: NixOS/nix#3187

Regarding local reproducibility, its just literally nix-build release.nix -A build.x86_64-linux - which should be reproducible enough :)
Regarding cached builds - do you really need a cache for nix builds? I mean it does not take that long to build locally. Otherwise one could just add cachix: https://github.com/nix-community/nur-packages-template/blob/master/.travis.yml#L31

Ericson2314 · 2019-10-31T18:18:14Z

@Mic92

This was the case for nixpkgs, but do you think they would timeout for nix as well?

If I add cross compilation to windows in the windows PR, that might time out.

Regarding local reproducibility, its just literally nix-build release.nix -A build.x86_64-linux - which should be reproducible enough :)

Yes that part is reproducable, but the wrapper YAML does other stuff and may fail or through away cache. That is hard to debug. Perhaps you've had better luck but I've seen the mainstream CIs waste a lot of developer time and am quiet sick of them overall.

Regarding cached builds - do you really need a cache for nix builds? I mean it does not take that long to build locally. Otherwise one could just add cachix: https://github.com/nix-community/nur-packages-template/blob/master/.travis.yml#L31

Yes, for the cross windows case, and other times we are hitting things that the underlying nixpkgs didn't cache. We could use cachix but I think the NixOS foundation should be consistent across projects; same CI and caching software. It's just simpler to maintain.

But to be clear all the things you say are still better to me than the status quo.

7c6f434c · 2019-11-01T12:17:45Z

@Ericson2314 there are some specific profitable resource abuse options that do not require using (and therefore risking) any zero-days exploits (but can be abused with auto-builds)

matthewbauer · 2019-11-01T15:55:10Z

This was the case for nixpkgs, but do you think they would timeout for nix as well?

If I add cross compilation to windows in the windows PR, that might time out.

Maybe we need a way in Nix to disallow uncached dependencies. Some PR to Nix shouldn't require building more than what's within Nix. Users should provide their own cached version if they need to build more. This limits the potential cost of CI without arbitrary rate limiting.

zimbatm · 2019-11-01T16:12:23Z

Well, if it is just Nix almost every CI would do. Hydra, Github Actions, Travis CI. A simple travis.yml would be shorter than this RFC.

The full Nix test suite depends on KVM to run integration tests, which is typically not supported by most build environments. Another requirement is to support both Linux and Darwin. And hopefully Windows in the future. If the idea is to keep Hydra always green it might be best just to go with Hydra, even if it's a separate setup.

zimbatm · 2019-11-01T16:18:16Z

rfcs/0000-ci-all-nix-prs.md

+
+There is a (famous blog post)[blog-post] that everyone is sloppy and doing CI wrong.
+This isn't just bad for releasing software smoothly, but also increases the burden for new contributors because it is harder to judge the correctness of PRs at a glance (is it broken? Did I break it?).
+I personally find it harder to contribute, I have to worry about double checking all my work on platforms I don't have as-easy access to, like Darwin.


Another motivation is that broken tests are not always the author's fault as master often turns red. Meaning that the author now has to checkout master to see this is a new error or not. If master is red, the author now has to either:

a. find a working commit in the history. If there are any merge conflict this might not be an option.
b. try and fix master himself. This might be completely unrelated work, meaning the author has to load and understand a new context.
c. wait or pester other contributors to fix master so they can continue their work.

Basically the later an error gets caught, the more expensive it becomes to fix.

We already have something like "rocket science" CI if you treat "nixpkgs-channels" as the source code. The channels will only update if the branch passes all of its blocking tests. This accomplishes the same goal as mentioned in the post and avoids the crazy costs that come with "rocket science" CI.

The problem with the way we use Hydra for nixpkgs-channels is that if a broken commit is ever merged to nixpkgs, everything that gets merged after the broken commit also gets blocked from making it to nixpkgs-channels until the problem is fixed.

Don't get me wrong: It's definitely much better than some "CI" systems I've used, but it's not quite as good as the first one I used, which only blocks the broken commit until it's fixed, while allowing other (passing) commits to be merged.

edolstra · 2019-11-01T16:32:41Z

Maybe we need a way in Nix to disallow uncached dependencies.

That wouldn't be really desirable, because it's pretty reasonable to change dependencies in a PR. E.g. our release.nix has

(aws-sdk-cpp.override {
  apis = ["s3" "transfer"];
  customMemoryManagement = false;
})

which you might want to change in a PR.

matthewbauer · 2019-11-04T16:19:01Z

rfcs/0000-ci-all-nix-prs.md

+# Detailed design
+[design]: #detailed-design
+
+Set up Hydra declarative jobsets to build all Nix PRs.


Having used these in the past, they rarely work well. I think the Hydra model only really works for "post-merge" CI, not "pre-merge" CI. A tool like OfBorg (or something else) seems like a good solution to "pre-merge" CI.

edolstra · 2019-11-07T12:37:28Z

Nominating myself for the RFC shepherd team.

nh2 · 2019-11-11T16:17:02Z

My opinion:

Yes, nix PRs should have automatic CI.
There are 2 safe ways to CI non-trusted-contributor PRs:
- Run them on "free" infrastructure that tests in VMs with timeouts (like TravisCI, CircleCI etc).
- Have some trusted contributer assert that there's nothing malicious in the PR (the ofborg model).

I think we should get the above working for nix before considering any form of automatic merging.

This is so that we get something done fast, and because clicking merge and reviewing small changes for maliciousness is easy, but and building and testing is what takes time. If we can automate just the latter for the beginning, we already win a lot.

Mic92 · 2019-12-13T12:18:12Z

We just need to decide on a leader.

zimbatm · 2019-12-16T18:05:03Z

leader=rand(4) :)

I think it's pretty obvious that this is a good thing to have. Even if it doesn't build the nested VMs tests, any CI is better than the current situation. I already did 3 different implementations with Travis, Circle CI and Azure Pipeline. Might as well to a 4th one with Hercules CI if we have the green light.

domenkozar · 2019-12-19T12:55:49Z

@matthewbauer would you mind to be the leader?

matthewbauer · 2019-12-19T15:38:05Z

@matthewbauer would you mind to be the leader?

Yeah I can accept!

matthewbauer · 2019-12-19T15:40:54Z

We should try to schedule a short meeting to get everyone on the same page.

matthewbauer · 2020-01-09T14:56:43Z

RFC Notes for 2019/01/09

We can setup Ofborg to build old prs. Ideally this would build all of release.nix.
Graham agrees Ofborg can be used for CI, but Hydra may actually be a better fit. Ofborg does not have a way of caching.
One concern of Ofborg is that logging gets mixed together in a single derivation.
Eelco would like to minimize number of CI.
Hydra has a github pr plugin that could be utilized, but it may take extra work to setup.
Ofborg sounds like a good option, but we have a few concerns. Eelco would like an instance just for Nix, so that PRs from Nix don’t have to wait on Nixpkgs PRs.
Ofborg merges commit to master before testing, which is good. This doesn’t meet the rocket science requirement but ofborg does let you retrigger each time and that can be done when there are few things to do.
Eelco and Graham talked about a potential issue in building the source tarball in release.nix with OfBorg. Some solutions are available though, namely using different platforms to build each tarball.
Domen discussed using Hercules which solves the above problem, but Hercules being closed source is probably a blocker.
Ofborg was decided as the CI to use. We all agree this is a step in the right direction. Graham expects it to be immediately usable for this purpose. Graham will be working on this tomorrow.

edolstra · 2020-01-09T15:37:31Z

Thanks for the notes!

Eelco would like an instance just for Nix, so that PRs from Nix don’t have to wait on Nixpkgs PRs.

I don't want a separate instance. It might be desirable though if wait times are high due to Nixpkgs PRs. However according to Graham this probably won't be a problem.

Ericson2314 · 2020-01-09T17:49:15Z

I'm sorry i missed the email thread on this completely. Glad to here a choice was made though, and @grahamc is unblocked. I am happy to revise this to reflect the consensus, or whatever else you all would like to see this document turn into. (Do we need it at all? Maybe a general CI policy is more worth codifying than a one-off-change for just Nix.)

edolstra · 2020-01-09T17:59:24Z

Yeah, I don't think we really need to codify this just for Nix. We're all in agreement that we want to CI all Nix PRs.

Ericson2314 · 2020-01-09T18:00:03Z

@edolstra Agreed, there is no other decision a Nix-only RFC would affect, so it's not worth writing down. What do you think for other NixOS repos like cabal2nix and what not, should we ofborg all of them? That would be worth writing up.

edolstra · 2020-03-05T13:57:23Z

Okay, I'll close this PR. Yeah, CI for other repos would certainly be good.

gilligan · 2020-03-06T20:25:16Z

Hey, can I ask what the status on this is?

I see it got closed since everyone is agreeing but.. It would be really nice to go for some quick-win here that works and can be used now-ish and then iterate on that.

I just wanted to try to hack something on Nix and realized several tests are failing on master. So it would be really nice to have something up and running even when it isn't the perfect solution. After all "perfect is the enemy of good" ;)

edolstra · 2020-03-06T20:35:17Z

We agreed to enable ofborg on the Nix repo to test PRs. I think this requires some changes to ofborg though (cc @grahamc).

Ericson2314 · 2020-03-07T19:25:34Z

Yes it would be nice to have issue(s) somewhere tracking the progress of ofborg-ing various repos.

roberth · 2020-03-08T10:12:49Z

I'd be happy to help out with Hercules CI.
Hercules CI can build the nix repo by add a ci.nix file, which works like nix-build. Here's an example result.
If you do want to use existing builders, Hercules CI supports a workflow where a review is required before building. Building directly on separate hardware is also an option.
For caching, Hercules CI can use Cachix.
If there's interest beyond the nix repo, users with write access to such repos can enable/disable Hercules CI for those. Alternatively, organization admins can force Hercules CI to be restricted to certain repos only.

but Hercules being closed source is probably a blocker.

If this is about vendor lock-in, there isn't any. Hercules' ci.nix is nix-build-able, so the cost of switching is microscopic. If it's ideological, well, the agent is open source and you're free to do anything. I don't want to force Hercules CI on any project.

@grahamc did you make any progress?

If anyone else thinks it's worth at least trying, these are the steps for adding hercules:

a NixOS GitHub admin authorizes the GitHub App. (non-admins can request but not authorize)
if the choice is for a review-first workflow, also for bors
deploy build agents

Should be easy, but I am happy to help with the deployment and updates of the build agents for the NixOS org, besides providing user support.

edolstra · 2020-03-09T09:41:55Z

IIRC, we did discuss Hercules CI in the last RFC meeting, but decided to go with ofborg.

domenkozar · 2020-03-11T13:44:26Z

When we had the meeting 2 months ago, the assumption was that @grahamc can code this up in the folllowing weekend.

Is there a new estimate?

domenkozar · 2020-03-13T11:23:09Z

I created NixOS/nix#3409 that:

creates no additional vendor lock-in
adds value (the build currently fails)
is minimal to maintain since I already maintain github actions for Nix
addresses @gilligan concern that I also share

domenkozar · 2020-03-13T15:57:46Z

Happy to report NixOS/nix#3409 has been merged!

Ericson2314 added 2 commits October 30, 2019 21:39

ci-all-nix-prs: Copy Template

20ab913

ci-all-nix-prs: Draft

da00dab

Ericson2314 force-pushed the ci-all-nix-prs branch from 3b8675d to da00dab Compare October 31, 2019 01:54

Ericson2314 changed the title ~~CI all nix PRs~~ CI all Nix PRs Oct 31, 2019

Mic92 changed the title ~~CI all Nix PRs~~ [RFC 0056] CI all Nix PRs Oct 31, 2019

shlevy mentioned this pull request Nov 1, 2019

RFC Meeting 2019-10-31 NixOS/rfc-steering-committee#19

Closed

zimbatm reviewed Nov 1, 2019

View reviewed changes

matthewbauer reviewed Nov 4, 2019

View reviewed changes

Ericson2314 mentioned this pull request Nov 11, 2019

Port Nix to Meson NixOS/nix#3160

Closed

Mic92 added status: in discussion and removed status: open for nominations Open for shepherding team nominations labels Dec 13, 2019

Mic92 mentioned this pull request Dec 19, 2019

RFC Meeting 2019-12-19 NixOS/rfc-steering-committee#25

Closed

edolstra mentioned this pull request Jan 2, 2020

RFC Meeting 2020-01-02 NixOS/rfc-steering-committee#27

Closed

edolstra mentioned this pull request Jan 9, 2020

RFC Meeting 2020-01-09 NixOS/rfc-steering-committee#28

Closed

edolstra mentioned this pull request Jan 23, 2020

RFC Meeting 2020-01-23 NixOS/rfc-steering-committee#29

Closed

worldofpeace mentioned this pull request Feb 6, 2020

RFC Meeting 2020-02-06 NixOS/rfc-steering-committee#30

Closed

edolstra closed this Mar 5, 2020

lheckemann mentioned this pull request Mar 5, 2020

RFC meeting 2020-03-05 NixOS/rfc-steering-committee#31

Closed

12 tasks

edolstra mentioned this pull request Mar 19, 2020

RFC meeting 2020-03-19 NixOS/rfc-steering-committee#32

Closed

10 tasks

Mic92 mentioned this pull request Apr 2, 2020

RFC meeting 2020-04-02 NixOS/rfc-steering-committee#33

Closed

10 tasks

[RFC 0056] CI all Nix PRs #56

[RFC 0056] CI all Nix PRs #56

Conversation

Ericson2314 commented Oct 31, 2019 • edited Loading

Ericson2314 commented Oct 31, 2019

sondr3 commented Oct 31, 2019

edolstra commented Oct 31, 2019

sondr3 commented Oct 31, 2019

Ericson2314 commented Oct 31, 2019

edolstra commented Oct 31, 2019

Ericson2314 commented Oct 31, 2019

edolstra commented Oct 31, 2019

Ericson2314 commented Oct 31, 2019 • edited Loading

edolstra commented Oct 31, 2019

Ericson2314 commented Oct 31, 2019 • edited Loading

Mic92 commented Oct 31, 2019 • edited Loading

Ericson2314 commented Oct 31, 2019 • edited Loading

Mic92 commented Oct 31, 2019 • edited Loading

Ericson2314 commented Oct 31, 2019 • edited Loading

7c6f434c commented Nov 1, 2019

matthewbauer commented Nov 1, 2019 • edited Loading

zimbatm commented Nov 1, 2019

zimbatm Nov 1, 2019 • edited Loading

Choose a reason for hiding this comment

matthewbauer Nov 4, 2019

Choose a reason for hiding this comment

aij Nov 18, 2019

Choose a reason for hiding this comment

edolstra commented Nov 1, 2019

matthewbauer Nov 4, 2019

Choose a reason for hiding this comment

edolstra commented Nov 7, 2019

nh2 commented Nov 11, 2019

Mic92 commented Dec 13, 2019

zimbatm commented Dec 16, 2019

domenkozar commented Dec 19, 2019

matthewbauer commented Dec 19, 2019

matthewbauer commented Dec 19, 2019

matthewbauer commented Jan 9, 2020 • edited Loading

RFC Notes for 2019/01/09

edolstra commented Jan 9, 2020

Ericson2314 commented Jan 9, 2020

edolstra commented Jan 9, 2020

Ericson2314 commented Jan 9, 2020 • edited Loading

edolstra commented Mar 5, 2020

gilligan commented Mar 6, 2020

edolstra commented Mar 6, 2020

Ericson2314 commented Mar 7, 2020

roberth commented Mar 8, 2020

edolstra commented Mar 9, 2020

domenkozar commented Mar 11, 2020 • edited Loading

domenkozar commented Mar 13, 2020 • edited Loading

domenkozar commented Mar 13, 2020

Ericson2314 commented Oct 31, 2019 •

edited

Loading

Ericson2314 commented Oct 31, 2019 •

edited

Loading

Ericson2314 commented Oct 31, 2019 •

edited

Loading

Mic92 commented Oct 31, 2019 •

edited

Loading

Ericson2314 commented Oct 31, 2019 •

edited

Loading

Mic92 commented Oct 31, 2019 •

edited

Loading

Ericson2314 commented Oct 31, 2019 •

edited

Loading

matthewbauer commented Nov 1, 2019 •

edited

Loading

zimbatm Nov 1, 2019 •

edited

Loading

matthewbauer commented Jan 9, 2020 •

edited

Loading

Ericson2314 commented Jan 9, 2020 •

edited

Loading

domenkozar commented Mar 11, 2020 •

edited

Loading

domenkozar commented Mar 13, 2020 •

edited

Loading