-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-fetch source when url changes #969
Comments
In fixed-output derivations, the hash is content addressed. We could get URL to be part of the hash, but then mirror/url change would trigger a rebuild, which is very unfortunate. I propose you change the workflow how you change sources, but I realize that's poor UX. |
I want to better understand the "stuck with the wrong source" thing you mention because I haven't experienced it, but otherwise I'm basically with @domenkozar on this. I don't like the usability aspects either, but I don't know how to retain the nice content-centric properties without something like this. One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash. This could be stored as a very simple "output-hash -> regular-derivation-hash" KV mapping. Then it could simply warn you that it's seeing the same output hash coming from a different derivation, which isn't necessarily an error but might be interesting to the user. |
@domenkozar: Could you expand on "I propose you change the workflow how you change sources, but I realize that's poor UX."? @copumpkin: What I mean is: nix wil write the old source in the store, under the new name. It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first. @ the both of you: it seems this issue is not as easy to fix as I hoped. |
I'm confused. The way Nix thinks about fixed-output derivations is as follows:
So here are the scenarios I can think of:
It sounds like you're saying (3) is leaving junk in the store that's identified with the new hash but contains the old data, but that doesn't fit with my understanding of how things work. What am I missing? |
@copumpkin: I'm talking about when you change the version (or name in general) of the pkg and the url, but not the hash. Sorry for the confusion, hope I'm clear now. |
I see the opposite in my tests:
I would think that nix shouldn't care about url change as long as the hash stays the same. But in reality the change of the url gives me both redownload and (massive) rebuild. |
@veprbl we're talking about the realized derivation hash, not the derivation hash itself. |
I was getting:
I run:
And deleted the results folder mentioned by previous command. After that nix-build picked up the hanged URL. |
I marked this as stale due to inactivity. → More info |
|
I was thinking, would it make sense to add an option like |
doesn't nix already track this for garbage collection purposes? |
I can confirm that this bug can lead to some security issues, described in an email sent to the security team (I'll likely create a public issue soonish). Other potential attacks (if other functionalities like cache sharing between Ofborg and reviewers/hydra is implemented) are described in NixOS/ofborg#68 (comment) What I propose to solve this is to keep on each nix installation a map EDIT: See my proposition here NixOS/ofborg#68 (comment) for a more robust version |
@tobiasBora
After @risicle pointed me your way in NixOS/rfcs#171 (comment), and after reading your comments in NixOS/ofborg#68 I came up with a very simple currently undetectable way to poison Hydra's cache in NixOS/rfcs#171 (comment):
But now thinking about this, I have an IMHO much simpler and thus scarier attack that _will_ work against Hydra:
- Malice makes an evil PR with a package update that changes the source's revision to a new version but points the hash to a hash
of an older version of the same package, a version which has a well-known CVE.
- A very paranoid Nixpkgs maintainer Alice (who does not use any build caches) looks at it, it looks fine, so she builds it, and it
works fine, since Alice has the older version of the source in `/nix/store`, and so she merges it.
- All tools that track unfixed CVEs in Nixpkgs only look at package versions, so those will be fooled too.
- Malice can now exploit the CVE.
So, this is actually a critical bug.
Merging NixOS/nixpkgs#49862, setting `nameSourcesPrettily = true` or `nameSourcesPrettily = "full"` by default, and banning of customly-named source derivations is a workaround that will work in the meantime.
|
Yes, I actually realized exactly this morning this issue with As a result, I submitted a CVE online. Patching fetchzip as discussed in NixOS/rfcs#171 is only a first step but does not stop the slightly more involved cache poisoning attacks. The only robust fix I can think of is what I am proposing in NixOS/ofborg#68 (comment)
That might be a first quick mitigation (actually not needed if we implement NixOS/ofborg#68 (comment)), but not something sufficient. Nobody would realize if a derivation is misnamed, for instance in a long "automatically generating" file. |
The only robust fix I can think of is what I am proposing in NixOS/ofborg#68 (comment)
While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.
I.e. you are trying to prevent poisoning of your local `/nix/store` with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) `/nix/store` directly too by reusing old hashes.
Merging NixOS/nixpkgs#49862, setting `nameSourcesPrettily = true` or `nameSourcesPrettily = "full"` by default
is one work-around for this.
As hinted in NixOS/rfcs#171 (comment), making `nix-store -r --check` semantics slightly more convenient:
- for a normal derivation, when the `outPath` is valid it should build the derivation again and check the result matches the previously known output exactly (as it does now),
- for a normal derivations, when the `outPath` is not valid, it should just build the derivation twice and compare the results (the current semantics of failing in this case is surprising and inconvenient),
- for a fixed-output, regardless of `outPath` validity, it should always do the build and then check the hash,
and then making OfBorg always run `nix-store -r --check` for all affected fixed-output derivations, thus limiting the set of attackers who could poison caches and `/nix/store`s to those who have direct commit access to Nixpkgs, is another work-around for this.
But the actual solution is to never reuse outputs between derivations without checking them first, i.e. Nix should remember a mapping of `outPath -> set<drvPath>` for each valid `outPath` in `/nix/store` (ATM Nix only stores one `drvPath` for each valid `outPath`, aka `nix-store -q --deriver`) and `nix-store -r` should force re-build (i.e. `--check`) each given derivation even if the corresponding `outPath` already exists and is valid in `/nix/store` in cases when that `outPath` does not have that `drvPath` in its `set<drvPath>` yet.
In other words, `/nix/store` path validity should be changed from being a flag ("is valid") to a predicate ("is valid as an output for this `drvPath`).
Similarly, substitutors should provide their `set<drvPath>` for each requested `outPath` to the client Nix'es (somehow) and a client Nix should check those before accepting a substitution.
Obviously, without any optimizations that would be incredibly inconvenient, as, among other things, not only changes to URLs and changes between `fetch*` functions would always cause a rebuild, but also any unrelated changes to `stdenv.mkDerivation` would cause all derivations using any of the `fetch*` functions to be rebuilt too.
One possible way to minimize rebuilds in this case is to implement a tiny `mkDerivationLite` variant of `stdenv.mkDerivationLite` and use that in all `fetch*` functions. Sure, updating `curl` or `git` will still cause a lot of rebuilds, but for the paranoid this could be a good thing: after all, what if the attacker poisoned one of those somehow.
Another way to reduce rebuilds with `drvPath`-predicated-validity is to add a new `outputPathToken` parameter to `derivation` which would declare to Nix that the `outPath` of this derivation is valid for any other derivation that maps to the same `outPath` and uses the same `outputPathToken`, i.e. it would declare an equivalence class of valid `drvPath`s for each `outPath` (i.e., in the above, `set<drvPath>` becomes `set<drvPath or outputPathToken>`).
Then,
- the current minimal-rebuilds behavior can be archived by always setting `outputPathToken` to some constant,
- keeping `outputPathToken` always unset will get you the most paranoid behavior at the cost of most rebuilds,
- precise control of what should or should not be rebuilt on changes to fixed-output derivations can be archived by setting `outputPathToken` to different things in different `fetch*` derivations, e.g.
- set `outputPathToken` to the `"fetchurl " + concatSepBy " " (sort urls)` in `fetchurl` so that changes to URLs would cause rebuilds,
- set it to `"fetchgit " + url + " rev " + rev` in `fetchgit` so that changes to URLs and revisions would also cause rebuilds,
- etc.
`nix.conf` could then provide an option to ignore all `outputPathToken` options, for the most paranoid.
|
I've to admit I don't yet fully grasp all your proposition, but here are a few comments after a first reading:
Well, I precisely want nix to implement it. All other solutions would either not work or require the user to do massive downloads (if the data don't exist, you have no other way than regenerating it yourself) to check all FOD, which can lead to many issues. Not only the user would need to download the (transitive) sources of all packages they want to install, even if they only care about the final binary, by if the source is not available anymore at its current location then the user would not even be able to install the binary (and the security team mentioned me that they don't want to go this path as it used to be this way and created many issues).
If one changes nix the way I propose, I don't see how anyone could poison Hydra's cache (of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons). The new
I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.
I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead
What do you call a rebuild here? Is it in the sense that:
|
(of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons)
What reasons? If I'm not using the Hydra binary cache and I'm not using the default bootstrap tarballs what is there to trust? Tarballs downloaded from Hydra tarball cache get hashed and checked against hashes in their derivations locally, as they should.
I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.
Putting the whole revision or hash into the source derivation name and banning of customly-named `fetch*` derivations will make that attack impossible.
But even without the first part (and with `<hash>-<name>-<version or short revision>-source` scheme), sure, in theory, the attacker could still poison it by pre-`fetch*`ing an evil source in an autogenerated files with the same derivation name if the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance.
So... almost never.
And then, users like me, that do not build anything from those huge autogenerated files and are not using Hydra binary cache will catch those mis-hashes immediately.
I'm running with NixOS/nixpkgs#49862 and `nameSourcesPrettily = "full"` enabled since 2018 (essentially, implementation details changed over time, but the behavior did not).
Do I use a small subset of Nixpkgs? Yes. Do source URLs still fail to fetch sometimes? Yes.
But usually things are pretty easy to fix, and older source versions usually can be found on archive servers without changes to any hashes.
But now, remembering the uncountable number of times I had to fix fixed-output hashes locally to make my systems build I'm feeling kind of paranoid.
Were all those outputs mis-hashed in honest error or were some of them attempts to poison something?
Some of those mis-hashes persisted for months in `master` while I was too annoyed with the PR process, thinking them honest errors that would be fixed eventually by people with direct commit access.
Was something poisoned already?
... implementation ...
I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead `drvPath -> outPath` & I also propose to let cache.nixos.org/… share this map.
Yes, algorithmically, we are proposing the same thing, but I'm essentially proposing we reuse the existing Nix derivers mechanism for this and while you are talking about how to make Hydra publish its derivers and make Nix use them, I'm pointing out that Hydra publishing its derivers is not enough: if Hydra does not have the output in its cache yet, then the user will have to `--fallback` build locally, meaning local Nix will have to do the check too, which means checking that the source derivation is in checked derivers set of that output, which would frequently fail, which would mean frequent mass rebuilds, and so I'm proposing a way to minimize those, and `outputPathToken` is a clever and simple solution, IMHO.
Precise formats of `outputPathToken` for different `fetch*`ers and their security implications could and should be discussed.
> One possible way to minimize rebuilds […]
What do you call a rebuild here? Is it in the sense that:
- the FOD output should be re-checked by Hydra, to complete the map `drvPath -> outPath` (which indeed should be done by hydra if the `fetch*` function change, and I guess having a lite version of `mkDerivation` makes sense if these often change)
I.e., in the terminology I use in my proposal, you are stating that Nix should ask its substitutors for their derivers set for the `outPath` it tries to substitute and then check that source derivation is in that set. To which I agree.
- the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of `drvPath -> outPath` I was proposing above
Sure, if substitutors are enabled, then Nix should only do "member in a set" check, not a full derivation rebuild (i.e. `nix-store -r --check`, which is what I call a "rebuild", because `nix build --rebuild` does it).
- the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.
But if the user does not use Hydra, and `curl` changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.
|
Anyone relying on Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.
I think you misunderstand the attack I have in mind. What I describe allows any adversary to inject arbitrarily malicious code (irrespective of the fact of whether the actual program will have a CVE or not) in a future release of any program, assuming they can simply guess the name of the release… which is trivial (these usually simply increase one by one and are often pre-released in an alpha form like
Is it really an experience we want for nix end users?
Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.
I don't see why we would get more rebuilds than now. If a program is in the cache, Hydra already downloaded its source, so the user can rely on Hydra's map. If Hydra has not built the package yet… anyway the user needs to rebuild everything. And if the package is a "legacy package" in the sense that it was built by Hydra before we even started to add this option, what I propose (maybe what you propose as well, not sure to understand all details) allows the user to only download & check the source while using the binary compiled by Hydra (which saves most of the time-consuming part).
I'm not sure to understand all details of your proposition here (maybe because I lack some knowledge of nix's internals) but I think I get the big picture. I think creating equivalent classes of derivations (which, I think, could be a nice concept in general in nix) could be a nice way to limit rebuilds in case of curl updates… but your implementation with
So based on the first point, since it seems hard to only allow specify fetchers to create a given
and nix should automatically understand that when receiving this derivation, it should download the url, using any CURL version they want as soon as the hash matches at the end. If we don't want to hardcode all of them in nix itself (let's try to keep nix small), we can let nix take as input a special nix file More specifically, This idea of equivalence of fetchers can certainly be used to also solve other issues with non-deterministic processes. For instance, for now we have no solutions to avoid hash shift with
where the hash only corresponds to the "stable" part of the package, i.e. what is outside
Well, checking the source, I agree it's nice to have… but no need to recompile programs. |
Well, checking the source, I agree it's nice to have… but no need to recompile programs.
Most of my mentions of "rebuild" above imply the context of "rebuild the source (`<name>.src`) derivation".
Package derivations (`<name>`) should not be rebuilt, of course.
> What reasons?
Anyone relying on `cache.nixos.org` (which is likely 99% of users) need to trust Hydra...
...
Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.
My point exactly.
Some users have personal and/or shared build machines, and `/nix/store`s on those can be poisoned too.
So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.
> If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.
I think you misunderstand the attack I have in mind. ...
I'm referencing the exploit described in #969 (comment) there.
> But now, remembering the uncountable number of times I had to fix fixed-output hashes locally
Is it really an experience we want for nix end users?
Well, those hashes are factually wrong, either in honest error, on in malice.
That's just the reality of the situation.
If Hydra or OfBorg verified them all, then those packages would have been fixed in `master` instead of in my local branches.
> which would mean frequent mass rebuilds
I don't see why we would get more rebuilds than now.
In the context I discuss there, say `curl` changes, now all `<name>.src` derivations that use `fetchurl` have to be rebuilt.
> Precise formats of `outputPathToken` for different `fetch*`ers and their security implications could and should be discussed.
... trusted fetchers ...
Hmm, that's a good point. Specifically, yes, `outputPathToken`s is probably not a good solution when the attacker can also modify the `fetch*` derivations themselves, so it would at the very least still leave OfBorg vulnerable.
```
Derive([("out", "/nix/store/foo.tar.xz", "sha256", "somehash")], [
"kind": "fetchurl",
"url": "myurl",
])
```
Making full-featured `curl` into a Nix builtin is a possibility (as opposed to `fetchurlBoot` that exists now), but what about derivations fetched with `curl` and then unpacked with `zip` (like `fetchFromGitHub`)? Should `zip` be a builtin too? What about those fetched with `git`?
Even if it's just `curl` and `zip`, you'd have also make Nix depend on SSL CA's to verify HTTPS certs, otherwise the hashes could be poisoned on your developer machine by MITMing your Nix HTTP traffic, and ca-certs derivation is also non-trivial.
Also, at the moment Nix built with all the optional things disabled is pretty small and can be used for minimal self-bootstrap, requiring all those things will make things harder.
So: Is it possible to do? Yes. Is it a desirable solution? Probably not.
> Was something poisoned already?
Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.
How are you doing this, exactly? Are you parsing Nix files by hand?
`nix-env -f ./default.nix -qa --argstr system "x86_64-linux" --attr-path --drv-path --out-path` gives the output that has the right data (after all, we need to check that different `--attr-path`s don't reuse or downgrade `--out-path`s), but it only evaluates package derivations, not source derivations.
If it did, running that on successive revisions of Nixpkgs, parsing the outputs, building a DB from the results, and then printing `--out-path` reuses and downgrades would be a good first step.
(Expensive to do, though. Especially since I don't have enough memory to evaluate all of Nixpkgs in a single run of `nix-env -qa` already, and evaluating a subset is kinda useless, given that any of the autogenerated package sets can be poisoned.)
I guess, the simplest way to do this, cheaper than doing `nix-env -qaP` followed by `nix-instantiate -A` on all `--attr-path`s, and then `nix-store -q --requisites`, followed by parsing the `.drv` files manually (which would be crazy expensive to do and will also torture your SSD) would be implementing `nix-env -q --all` option so that `nix-env -f ./default.nix -q --all --argstr system "x86_64-linux" --attr-path --drv-path --out-path` would print that data for _all_ `derivation`s Nix encounters while doing evaluations, not only the ones `--available` from the current Nix expression.
My ability to patch Nix sources is fairly superficial ATM and I don't really care for this much, given that NixOS/nixpkgs#49862 with a small custom-name-warning change on top basically solves the problem for me (given that I don't use Hydra).
But it would be nice if someone could implement that `nix-env -q --all` and then possibly somebody else would share all those evaluations on a bunch of revisions of `master`, so that we could check if Hydra and all users using it were PWNed yet.
|
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/nixpkgs-supply-chain-security-project/34345/19 |
I'm not sure why my last answer has not been posted. In the meantime the CVE got published at https://nvd.nist.gov/vuln/detail/CVE-2024-36050
Good, just wanted to be sure we are on the same basis.
Definitely agree. My point was just to try to reduce the amount of downloaded materials from the user perspective using the cache when available, otherwise users (especially those with little bandwith) would suffer a lot, and if the source is changed they would get 404 error even if it is still cached.
I show here a much more generic attack https://github.com/leo-colisson/CVE_NixOs_cache_spoil/ with some variants using
Would result in:
Yeah I see now, hence my proposition to have trusted fetchers.
Well, they can always do that, for instance by implementing their own FOD derivation, without even relying on a pre-existing
That's precisely why I don't want to include this in Nix directly, but provide to nix a
This way, when a user calls
(see that it is agnostic of
Right now this is made in a very dirty way, via a simple grep (install
You can see that the number of duplicated hashes is quite large (155), so I try to discard some of them automatically, for instance if I see the same url twice around the good lines, but this is very dirty, and I don't consider this as any good security measure. Even with this sorting there is quite a lot of entries to manually read, so it would be better to actually extract all such |
Currently the source is only re-fetched when the hash changes.
When you use
nix-env -i
to find the new hash, but forget to make a change to the hash, you are stuck with the wrong source.To add injure to insult: the source is then not re-fetched when you change the hash: you first have to garbage-collect the old source.
The text was updated successfully, but these errors were encountered: