Skip to content
This repository has been archived by the owner on Jan 30, 2023. It is now read-only.

Concerns over readDir speed #17

Closed
infinisil opened this issue Nov 16, 2022 · 6 comments
Closed

Concerns over readDir speed #17

infinisil opened this issue Nov 16, 2022 · 6 comments
Assignees

Comments

@infinisil
Copy link
Member

infinisil commented Nov 16, 2022

readDir does a stat on all the directories entries to figure out their type. We don't need that information though. There are some ways of making this faster:

  • Making readDir lazily stat the entries
  • Introducing a builtins.listDirectory which doesn't return the file types
  • Suggested on Matrix, oooon Linux there's now getdents which allows reading multiple directory entries at once, not POSIX compatible though, would be opportunistic when available

This ties into the sharding discussion in #1 , because with sharding, more directories need to be listed.

@aakropotkin
Copy link

Just as a reminder the workaround I used was auto-updating a complete manifest of files before each commit. This is a pain in the ass and probably isn't practical in Nixpkgs but I thought I'd mention it.

@roberth
Copy link
Contributor

roberth commented Nov 16, 2022

nix issue: NixOS/nix#7314

oooon Linux there's now

Not a recent feature; can be relied on for performance, but technically should have a graceful degradation, which we need anyway.

@infinisil
Copy link
Member Author

The above idea is now being implemented in NixOS/nix#7447. We discussed that PR in today's NAT meeting

@zimbatm
Copy link
Contributor

zimbatm commented Dec 24, 2022

If nix is going to be changed, I don't know if you considered introducing a lazy attrset type, so that the full list of packages doesn't need to be generated. The idea would be to have a constructor that provides an accessor function, and that function is called on first access.

EDIT: Actually, this might not work because with pkgs requires the full list of packages to be added to the scope.

@aakropotkin
Copy link

aakropotkin commented Jan 2, 2023

Benchmarking Results:

A playground repository was created to test readDir performance with and without patches: https://github.com/aakropotkin/shard-test

Upfront I want to highlight that this playground largely helps us prove "there is no loss of performance" as opposed to whether or not there is an improvement. This is because Linux and Darwin both implement readdents and use this routine for readDir in both the patched and unpatched forms of Nix. In order to test for performance improvements a system with a filesystem implementation which cannot execute readdents, or an operating system which lacks a readdents system call would be required. If readdents is available on a system the branching logic that causes readFileType to be used is only run for oddball filesystem nodes such as those found under /dev/ - these types of entries were tested for correctness; but because there are relatively few of them on any given system it is not worthwhile to use them as a basis for benchmarking.

readDir was tested on large numbers of entries in the playground with up to 1,000,000 directories and we found no loss of performance as a result of extending Nix with the new readFileType builtin.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-01-03-nixpkgs-architecture-team-meeting-23/24404/1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants