-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 0056] CI all Nix PRs #56
Conversation
3b8675d
to
da00dab
Compare
Exhibits:
|
So something like the Bors bot used in a lot of the Rust projects? I'd also love to see something like |
We can't use Hydra for this for security reasons. Maybe ofborg could do it. Or we could run a separate Hydra instance that doesn't have write access to cache.nixos.org. @sondr3 We definitely shouldn't auto-merge PRs just because they build correctly. Just because something builds doesn't mean it's a good idea. |
@edolstra not all PRs no, but a lot of the contributed PRs are simply minor or patch version bumps and most of them should be fine to auto merge after building in my opinion. As mentioned, this has already been brought up in |
@Ericson2314 Anybody could submit a PR that DoSes the Hydra evaluator or queue, or exploits a zero-day kernel bug to root the build machines. |
Aren't those already problems with hydra building arbitrary stuff in Nixpkgs? I see no security difference between submitting arbitrary C++ and arbitrary Nix that can fetch external C++.
We have timeouts and other resource limits already, right?
Say a package in Nixpkgs has such a bug, doesn't use it during the ofborg build except to get a side-channel to tell whether it's the Am I missing something? |
Hydra doesn't build arbitrary stuff, it builds stuff that has been merged so presumably a human has looked at it. It's ofborg that's exposed, but it doesn't have write access to cache.nixos.org so it doesn't really matter.
There are some limits, but there is nothing preventing jobset evaluation consuming all memory (we already do a good job of that ourselves unintentionally...), or a jobset creating a billion builds in the database, or a lots of jobs that fork-bomb every build machine... |
I guarantee our reviewers will not reliably catch zero-days, full stop.
Those DDOs problems are all fixable with not much effort. I wouldn't characterize them as a security issues because they do not compromise the store, and the availability of hydra is already....not great. Given that those are low-impact problem, I would adopt this RFC as is, make some effort to limit resources better, and only if/when spam becomes a problem disable this policy until the issues are fixed. |
If the availability is not great, that means it's not appropriate as a PR building tool (where you want quick feedback). |
I mean that hydra goes down a fair amount; when it's up it works better for building PRs with Nix than travis, or circle-ci, etc, for Nix. Hercules or something would do better than Hydra, but that's a separate conversation. Basically, I'm saying that unless there is a serious security problem or something, we should CI the we want to regardless of hydra's fragility, rather than shifting the goalposts to avoid doing something about Hydra being fragile. |
Well, if it is just Nix almost every CI would do. Hydra, Github Actions, Travis CI. A simple travis.yml would be shorter than this RFC. |
@Mic92 Those are often timeout hell. Also, the YAML/whatever boilerplate for those is a huge maintenance burden because it is impossible to reproduce locally. Finally, I do want cached builds of PRs, not merely a CI green light. I think that's a great feature. |
@Ericson2314 This was the case for nixpkgs, but do you think they would timeout for nix as well? In nix's Regarding local reproducibility, its just literally |
If I add cross compilation to windows in the windows PR, that might time out.
Yes that part is reproducable, but the wrapper YAML does other stuff and may fail or through away cache. That is hard to debug. Perhaps you've had better luck but I've seen the mainstream CIs waste a lot of developer time and am quiet sick of them overall.
Yes, for the cross windows case, and other times we are hitting things that the underlying nixpkgs didn't cache. We could use cachix but I think the NixOS foundation should be consistent across projects; same CI and caching software. It's just simpler to maintain. But to be clear all the things you say are still better to me than the status quo. |
@Ericson2314 there are some specific profitable resource abuse options that do not require using (and therefore risking) any zero-days exploits (but can be abused with auto-builds) |
Maybe we need a way in Nix to disallow uncached dependencies. Some PR to Nix shouldn't require building more than what's within Nix. Users should provide their own cached version if they need to build more. This limits the potential cost of CI without arbitrary rate limiting. |
The full Nix test suite depends on KVM to run integration tests, which is typically not supported by most build environments. Another requirement is to support both Linux and Darwin. And hopefully Windows in the future. If the idea is to keep Hydra always green it might be best just to go with Hydra, even if it's a separate setup. |
|
||
There is a (famous blog post)[blog-post] that everyone is sloppy and doing CI wrong. | ||
This isn't just bad for releasing software smoothly, but also increases the burden for new contributors because it is harder to judge the correctness of PRs at a glance (is it broken? Did I break it?). | ||
I personally find it harder to contribute, I have to worry about double checking all my work on platforms I don't have as-easy access to, like Darwin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another motivation is that broken tests are not always the author's fault as master often turns red. Meaning that the author now has to checkout master to see this is a new error or not. If master is red, the author now has to either:
a. find a working commit in the history. If there are any merge conflict this might not be an option.
b. try and fix master himself. This might be completely unrelated work, meaning the author has to load and understand a new context.
c. wait or pester other contributors to fix master so they can continue their work.
Basically the later an error gets caught, the more expensive it becomes to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have something like "rocket science" CI if you treat "nixpkgs-channels" as the source code. The channels will only update if the branch passes all of its blocking tests. This accomplishes the same goal as mentioned in the post and avoids the crazy costs that come with "rocket science" CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the way we use Hydra for nixpkgs-channels is that if a broken commit is ever merged to nixpkgs, everything that gets merged after the broken commit also gets blocked from making it to nixpkgs-channels until the problem is fixed.
Don't get me wrong: It's definitely much better than some "CI" systems I've used, but it's not quite as good as the first one I used, which only blocks the broken commit until it's fixed, while allowing other (passing) commits to be merged.
That wouldn't be really desirable, because it's pretty reasonable to change dependencies in a PR. E.g. our (aws-sdk-cpp.override {
apis = ["s3" "transfer"];
customMemoryManagement = false;
}) which you might want to change in a PR. |
rfcs/0000-ci-all-nix-prs.md
Outdated
# Detailed design | ||
[design]: #detailed-design | ||
|
||
Set up Hydra declarative jobsets to build all Nix PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having used these in the past, they rarely work well. I think the Hydra model only really works for "post-merge" CI, not "pre-merge" CI. A tool like OfBorg (or something else) seems like a good solution to "pre-merge" CI.
Nominating myself for the RFC shepherd team. |
My opinion:
I think we should get the above working for nix before considering any form of automatic merging. This is so that we get something done fast, and because clicking merge and reviewing small changes for maliciousness is easy, but and building and testing is what takes time. If we can automate just the latter for the beginning, we already win a lot. |
We just need to decide on a leader. |
leader=rand(4) :) I think it's pretty obvious that this is a good thing to have. Even if it doesn't build the nested VMs tests, any CI is better than the current situation. I already did 3 different implementations with Travis, Circle CI and Azure Pipeline. Might as well to a 4th one with Hercules CI if we have the green light. |
@matthewbauer would you mind to be the leader? |
Yeah I can accept! |
We should try to schedule a short meeting to get everyone on the same page. |
RFC Notes for 2019/01/09
|
Thanks for the notes!
I don't want a separate instance. It might be desirable though if wait times are high due to Nixpkgs PRs. However according to Graham this probably won't be a problem. |
I'm sorry i missed the email thread on this completely. Glad to here a choice was made though, and @grahamc is unblocked. I am happy to revise this to reflect the consensus, or whatever else you all would like to see this document turn into. (Do we need it at all? Maybe a general CI policy is more worth codifying than a one-off-change for just Nix.) |
Yeah, I don't think we really need to codify this just for Nix. We're all in agreement that we want to CI all Nix PRs. |
@edolstra Agreed, there is no other decision a Nix-only RFC would affect, so it's not worth writing down. What do you think for other NixOS repos like cabal2nix and what not, should we ofborg all of them? That would be worth writing up. |
Okay, I'll close this PR. Yeah, CI for other repos would certainly be good. |
Hey, can I ask what the status on this is? I see it got closed since everyone is agreeing but.. It would be really nice to go for some quick-win here that works and can be used now-ish and then iterate on that. I just wanted to try to hack something on Nix and realized several tests are failing on master. So it would be really nice to have something up and running even when it isn't the perfect solution. After all "perfect is the enemy of good" ;) |
We agreed to enable ofborg on the Nix repo to test PRs. I think this requires some changes to ofborg though (cc @grahamc). |
Yes it would be nice to have issue(s) somewhere tracking the progress of ofborg-ing various repos. |
I'd be happy to help out with Hercules CI.
If this is about vendor lock-in, there isn't any. Hercules' @grahamc did you make any progress? If anyone else thinks it's worth at least trying, these are the steps for adding hercules:
Should be easy, but I am happy to help with the deployment and updates of the build agents for the NixOS org, besides providing user support. |
IIRC, we did discuss Hercules CI in the last RFC meeting, but decided to go with ofborg. |
When we had the meeting 2 months ago, the assumption was that @grahamc can code this up in the folllowing weekend. Is there a new estimate? |
I created NixOS/nix#3409 that:
|
Happy to report NixOS/nix#3409 has been merged! |
Build all Nix PRs in CI. Do not merge any PR until it passes CI.
This has been discussed for Nixpkgs, but it should be more economically feasible for just Nix.
Rendered