Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash when NAR is missing from binary cache #9804

Merged
merged 3 commits into from
Jan 19, 2024

Conversation

edolstra
Copy link
Member

Motivation

This fixes random crashes like

terminate called after throwing an instance of 'nix::SubstituteGone'
  what():  error: file 'nar/06br3254rx4gz4cvjzxlv028jrx80zg5i4jr62vjmn416dqihgr7.nar.xz' does not exist in binary cache 'http://localhost'
Aborted (core dumped)

and

terminate called recursively
Aborted (core dumped)

observed with magic-nix-cache when NARs (but not the corresponding .narinfos) had been pruned from the GHA cache.

Nix already handles missing NARs via the SubstituteGone exception. However, when using allowSubstitutes = false, it is possible to end up in a situation where Nix simultaneously has a DerivationGoal and a SubstitutionGoal for the same path. If the path is built first, Nix will then substitute it, causing LocalStore::addToStore() to discard the incoming NAR from a Finally handler because the path is already valid. This causes the SubstituteGone exception to be thrown from a destructor, which crashes Nix.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

In the "discard" case (i.e. when the store path already exists
locally), when we call parseDump() from a Finally and it throws an
exception (e.g. if the download of the NAR fails), Nix crashes:

   terminate called after throwing an instance of 'nix::SubstituteGone'
     what():  error: file 'nar/06br3254rx4gz4cvjzxlv028jrx80zg5i4jr62vjmn416dqihgr7.nar.xz' does not exist in binary cache 'http://localhost'
   Aborted (core dumped)
In rare cases (e.g. when using allowSubstitutes = false), it's
possible that we simultaneously have a DerivationGoal *and* a
SubstitutionGoal building the same path. So if a DerivationGoal
already built the path while the SubstitutionGoal was waiting for a
download slot, it saves us a superfluous download to exit early.
@edolstra edolstra added the backport 2.19-maintenance Automatically creates a PR against the branch label Jan 18, 2024
@edolstra edolstra requested a review from thufschmitt as a code owner January 18, 2024 16:25
@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Jan 18, 2024
parseDump(sink, source);
try {
parseDump(sink, source);
} catch (...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} catch (...) {
} catch (SubstituteGone &) {

I don't think we want to swallow any kind of exception, because swallowing genuine NAR parse errors might leave the source in a partially consumed state that'll cause nonsense error down the line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot make assumptions here about what kind of exception the source might throw. SubstituteGone is just one example, I/O errors are another.

NAR parse errors would already have left the source in an unknown state. The only place where that matters is the daemon connection, and we have FramedSource now to take care of that.

@edolstra
Copy link
Member Author

BTW, perhaps we should add the try { ... } catch (...) { ignoreException() } to ~Finally() since there may well be more cases where we could throw an exception from a Finally.

@edolstra edolstra merged commit 3b20cca into NixOS:master Jan 19, 2024
9 checks passed
@edolstra edolstra deleted the missing-nar-crash branch January 19, 2024 08:38
Copy link

Successfully created backport PR for 2.19-maintenance:

tebowy pushed a commit to tebowy/nix that referenced this pull request Jul 11, 2024
Fix crash when NAR is missing from binary cache

(cherry picked from commit 3b20cca)
Change-Id: I50ff18f4a6de69c323473b4a8e3e098d1f365145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.19-maintenance Automatically creates a PR against the branch store Issues and pull requests concerning the Nix store
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants