Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trivial-builders: replace writeReferencesToFile with writeClosure #178717

Merged
merged 7 commits into from
Mar 19, 2024

Conversation

ShamrockLee
Copy link
Contributor

@ShamrockLee ShamrockLee commented Jun 23, 2022

Description of changes

The new builder takes a list paths instead of a single path, and print their closure to a file. It would be helpful when building a container image from a list of
packages. See #177908 (comment) .

I have prepared two versions of implementation, both of which tries to merge the result of writeReferencesToFile on each derivation:

  1. Merge into a Bash associative array with the out-path hashes as the keys, and sort the keys lexicographically with the sort command. Time complexity: $O(N \log N)$, I/O time complexity $O(N)$, where $N$ is the total number of references. (https://github.com/ShamrockLee/nixpkgs/blob/write-multiple-references-dict/pkgs/build-support/trivial-builders.nix#L671=)
  2. Merge using the merge-sorted-arrays algorithm given that the results of writeReferencesToFile are sorted. Time complexity: $O(mN)$, I/O time complexity: $O(N)$, where $m$ is the length of paths and $N$ is the total number of references. (https://github.com/ShamrockLee/nixpkgs/blob/write-multiple-references-merge/pkgs/build-support/trivial-builders.nix#L671=)

This PR adopts the first implementation, since the code is shorter and more understandable.

Update:
@SomeoneSerge proposed the third approach: Write the path list with writeText, get its list of dependencies with writeReferencesToFile and then remove the line containing the store path of the path list derivation. Just re-implement this PR following this approach.

Update:
Use the new exportReferencesGraph interface with __structuredAttrs = true; and parse .attrs.json with jq.

The test cases of writeReference* builders doesn't run (#204456), blocking the test case adding for this new builder.

Update:

The test that makes sure the equivalence between the writeMultipleReferencesToFile [ path1 path2 ... ] and nix-store --query --references outPath1 outPath2 ... is added to the NixOS test tests.trivial-builders.references.

Add two package tests:

  • tests.trivial-builders.writeMultipleReferencesToFile-single ensure the equivalence between writeMultipleReferencesToFile [ path ] and writeReferencesToFile path.
  • tests.trivial-builders.writeMultipleReferencesToFile-mixed ensures that the content of the result derivation of writeMultipleReferencesToFile [ path1 path2 ... ] is the unique collection of that of (writeReferencesToFile path1, writeReferencesToFile path2, ...).
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
    • Content of files produced by by pkgs.writeMultipleReferencesToFile [ pkgs.jupyter ] and pkgs.writeReferencesToFile pkgs.jupyter is the same.
    • Content of files produced by by pkgs.writeMultipleReferencesToFile [ pkgs.gcc pkgs.jupyter ] and pkgs.writeReferencesToFile [ pkgs.jupyter pkgs.gcc ] is the same.
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jun 23, 2022
@ShamrockLee ShamrockLee changed the title trivial-builders: add writeMultipleReferencesToFile trivial-builders: add writeMultipleReferencesToFile; tidy up the test cases Dec 3, 2022
@ShamrockLee ShamrockLee changed the title trivial-builders: add writeMultipleReferencesToFile; tidy up the test cases trivial-builders: add writeMultipleReferencesToFile; tidy up the tests Dec 3, 2022
@ofborg ofborg bot added 10.rebuild-linux: 1-10 10.rebuild-linux: 1 and removed 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Dec 3, 2022
@ofborg ofborg bot added 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux and removed 10.rebuild-linux: 1 10.rebuild-linux: 1-10 labels Dec 4, 2022
@ShamrockLee ShamrockLee changed the title trivial-builders: add writeMultipleReferencesToFile; tidy up the tests trivial-builders: add writeMultipleReferencesToFile Feb 9, 2023
@ShamrockLee
Copy link
Contributor Author

@jbedo @SomeoneSerge @GaetanLepage

This could replace singularity-tools.mkLayer and prevent unnecessary waste of space. Could you guys help take a look?

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Feb 9, 2023

Just add the test cases anyway. Let's see how OfBorg would run them.

Update: trivial-builders.nix tests are not library tests and OfBorg won't run them.

Update: Let's prevent the already-not-working test suite from growing more complex.

@ShamrockLee
Copy link
Contributor Author

@ofborg test references

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 3, 2023

Awesome. I'll just note that I think bash is on its way of becoming a burden here. I would also really like to run some simple benchmarks because I don't have a very concrete understanding of how much of an issue this mkLayer is right now, and how much this change improves it, but slow as I am I wouldn't insist that you wait for my approval on merging

@jbedo
Copy link
Contributor

jbedo commented Apr 3, 2023

It's still short and understandable, so I think a shift away from bash would have more downsides than advantages.

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Apr 4, 2023

I would also really like to run some simple benchmarks because I don't have a very concrete understanding of how much of an issue this mkLayer is right now, and how much this change improves it

I haven't done a serious benchmark yet. The reason why it helps is:

  • mkLayer produces a package by copying all the files in the output path of packages in contents into its own output path.
  • What buildImage needs is only a list of store paths of contents and their run-time dependencies (made with writeReferencesToFile), making the files mkLayer copies useless.
  • writeMultipleReferencesToFile simply dump the list of store paths to its output file, preventing the redundant copying.

When dealing with large packages like root (CERN ROOT, a C++ data analysis framework), the use of mkLayer doubles the image build time and the disk space usage.

@jbedo
Copy link
Contributor

jbedo commented Apr 7, 2023 via email

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Apr 8, 2023

A temporary store would be neat!

rather than a shell to prime you can just set jobs to 0

I don't understand this part.

The purpose of nix-shell -A hello --run exit is to build all the dependencies before the benchmark. (I originally use sh -c ":" since I wasn't sure if it would be okay not to begin with an executable.)

@jbedo
Copy link
Contributor

jbedo commented Apr 8, 2023 via email

@ShamrockLee ShamrockLee force-pushed the write-multiple-references branch 2 times, most recently from 2fc722f to 81cabce Compare March 18, 2024 00:48
@ShamrockLee ShamrockLee changed the title trivial-builders: replace writeReferencesToFile with writeReferenceClosureToFile trivial-builders: replace writeReferencesToFile with writeClosure Mar 18, 2024
Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you chose the more extreme name. Personally, I like this better, but I'll wait a bit for someone else to defend "References". I suppose "References" is a more explicit reference (jaja) to the the GC and dependency detection: you "writeReferencesToFile" to retain a runtime dependency on something you've seen at build time

@roberth
Copy link
Member

roberth commented Mar 18, 2024

In Nix, we have a default relation we refer to when talking about "the" closure, which is the relation of references, so I think the name writeClosure is fine.

I suppose "References" is a more explicit reference (jaja) to the the GC and dependency detection: you "writeReferencesToFile" to retain a runtime dependency on something you've seen at build time

For this use case, taking the direct references is sufficient. The GC knows that the whole closure is reachable from that starting set.
"Direct" is implied when talking about references, but I've had to use that term when I added writeDirectReferencesToFile, to make it distinct from the somewhat badly named writeReferencesToFile.

Replace writeReferencesToFile with writeClosure.

Make writeClosure accept a list of paths instead of a path.

Re-implement with JSON-based exportReferencesGraph interface provided by
__structuredAttrs = true.

Reword the documentation.

Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io>
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Mar 18, 2024

Just fixed a commit message. Sorry for the noise.

Copy link
Contributor

@SomeoneSerge SomeoneSerge Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've been waiting for the darwin check and couldn't resist posting a) Nit: might as well use nixpkgs#nixfmt-rfc-style for the new files:) But let's maybe not reset Ofborg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That RFC style guide is a long one. Since it doesn't affect evaluation,we could reformat in subsequent PRs.

Copy link
Contributor Author

@ShamrockLee ShamrockLee Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.trivial-builders.references currently evaluates to { } on non-Linux to make OfBorg build always green. We don't have to wait for it.

BTW, #273183 provides non-Linux developers with utilities to run and inspect the tests, while keeping the CI green.

@SomeoneSerge SomeoneSerge merged commit 6370996 into NixOS:master Mar 19, 2024
24 of 25 checks passed
read nrRefs
for ((i = 0; i < nrRefs; i++)); do read ref; done
done < graph
jq -r ".graph | map(.path) | sort | .[]" "$NIX_ATTRS_JSON_FILE" > "$out"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: you can now open a PR with the lib.sort feature you proposed earlier. I still don't understand the implications this would have

Copy link
Contributor Author

@ShamrockLee ShamrockLee Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#300722 it is! Please take a look.

@ShamrockLee
Copy link
Contributor Author

Does writeMultupleReferencesToFile not duplicate e.g. closureInfo?

@SomeoneSerge, sorry that I made it wrong. "${pkgs.writeClosure paths}" is a duplication of "${pkgs.closureInfo paths}".

I didn't realize that the whole closure of pkgs.linux is only itself.

@SomeoneSerge
Copy link
Contributor

"${pkgs.writeClosure paths}" is a duplication of "${pkgs.closureInfo paths}".

I thought we concluded someplace else that writeClosure had a different signature?

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Aug 3, 2024

I thought we concluded somewhere else that writeClosure had a different signature

They have different signatures. pkgs.closureInfo is not a drop-in replacement of pkgs.writeClosure. As for their functions, one can use pkgs.closureInfo to achieve a functionality similar to that of pkgs.writeClosure.

I said they are "not even close" (#178717 (comment)) based on the content of the file "${pkgs.closureInfo { storePaths = [ pkgs.linux ]; }/store-paths". It contains only one line

nixpkgs_timeshift (9b54fb4) [$]
❯ cat /nix/store/sd8vsqzxhq9x4hyz3fvvggss0i4wkqvi-closure-info/store-paths
/nix/store/lgkpyy1b1zzh04n8b6andg8iw8vnr07m-linux-6.6.21

I thought the closure of pkgs.linux would be much bigger than that, but now I know I was wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 11.by: package-maintainer This PR was created by the maintainer of the package it changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants