Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] File set combinators #222981

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
e056656
lib.lists.commonPrefix[Length]: init
infinisil May 19, 2023
6a5b2ff
Add some functions to lib.path
infinisil May 19, 2023
14e18fe
lib.fileset: init
infinisil May 19, 2023
4576c67
Integrate sources with subpaths into mkDerivation
infinisil May 19, 2023
559d3ef
lib.fileset: reference introduction docs
infinisil May 19, 2023
f9e0c13
Slight branch optimization
infinisil May 22, 2023
49baed0
Improve some error messages and add some code comments
infinisil May 22, 2023
0e939bf
Use lib.filesystem.pathType instead lib.sources.pathType
infinisil May 23, 2023
ab93250
Add lib.fileset.optional
infinisil May 31, 2023
c9b8398
Minor formatting improvements
infinisil May 31, 2023
698773b
Improve integration with lib.sources
infinisil May 31, 2023
ca9d82f
filter -> fileFilter, directoryMatches -> directoryFilter
infinisil May 31, 2023
dad5189
Fix mkDerivation integration
infinisil May 31, 2023
3941ef7
Improve importToStore error message
infinisil May 31, 2023
884ace1
importToStore -> addToStore
infinisil May 31, 2023
d3d3b8b
Make the mkDerivation src behavior more lazy
infinisil May 31, 2023
f0bd959
Add myself as a code owner
infinisil Jun 8, 2023
7257081
stdenv.mkDerivation: Implement `srcWorkDir` attribute
infinisil Jun 8, 2023
633e785
lib.fileset.getInfluenceBase: init
infinisil Jun 8, 2023
63957e2
lib.fileset.traceVal: init
infinisil Jun 8, 2023
f32627a
lib.path.commonAncestor: init
infinisil Jun 8, 2023
5c2a1c7
lib.fileset.{impureFromSource -> fromSource}
infinisil Jun 8, 2023
f27d444
Rethink store importing, lib.fileset.toSource
infinisil Jun 8, 2023
145af37
stdenv.mkDerivation: Add fileset integration with `srcFileset`
infinisil Jun 8, 2023
80e6630
Update reference documentation after recent commits
infinisil Jun 8, 2023
0379f40
lib.filesystem.pathType: Fix for <2.14, pure eval, nix store paths
infinisil Jun 13, 2023
5b0229c
Remove srcFileset stdenv.mkDerivation attribute again
infinisil Jun 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
/lib/debug.nix @edolstra @Profpatsch
/lib/asserts.nix @edolstra @Profpatsch
/lib/path.* @infinisil @fricklerhandwerk
/lib/fileset.nix @infinisil

# Nixpkgs Internals
/default.nix @Ericson2314
Expand Down Expand Up @@ -61,6 +62,7 @@
/doc/build-aux/pandoc-filters @jtojnar
/doc/contributing/ @fricklerhandwerk
/doc/contributing/contributing-to-documentation.chapter.md @jtojnar @fricklerhandwerk
/doc/functions/fileset.section.md @infinisil

# NixOS Internals
/nixos/default.nix @infinisil
Expand Down
1 change: 1 addition & 0 deletions doc/doc-support/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ let
{ name = "options"; description = "NixOS / nixpkgs option handling"; }
{ name = "path"; description = "path functions"; }
{ name = "filesystem"; description = "filesystem functions"; }
{ name = "fileset"; description = "file set functions"; }
{ name = "sources"; description = "source filtering functions"; }
{ name = "cli"; description = "command-line serialization functions"; }
];
Expand Down
1 change: 1 addition & 0 deletions doc/functions.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@
<xi:include href="functions/debug.section.xml" />
<xi:include href="functions/prefer-remote-fetch.section.xml" />
<xi:include href="functions/nix-gitignore.section.xml" />
<xi:include href="functions/fileset.section.xml" />
</chapter>
245 changes: 245 additions & 0 deletions doc/functions/fileset.section.md
infinisil marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# File sets {#sec-fileset}

The [`lib.fileset`](#sec-functions-library-fileset) functions allow you to work with _file sets_.
File sets efficiently represent a set of local files.
They can easily be created and combined for complex behavior.
Their files can also be added to the Nix store and used as a derivation source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Their files can also be added to the Nix store and used as a derivation source.
The contained files can also be added to the Nix store and used as a derivation source.


The best way to experiment with file sets is to start a `nix repl` and load the file set functions:
```
$ nix repl -f '<nixpkgs/lib>'

nix-repl> :a fileset
Added 13 variables

nix-repl>
```

The most basic way to create file sets is by passing a [path](https://nixos.org/manual/nix/stable/language/values.html#type-path) to [`coerce`](#function-library-lib.fileset.coerce). The resulting file set depends on the path:
- If the path points to a file, the result is a file set only consisting of that single file.
- If the path points to a directory, all files in that directory will be in the resulting file set.

Let's try to create a file set containing just a local `Makefile` file:
```nix
nix-repl> coerce ./Makefile
{ __noEval = «error: error: File sets are not intended to be directly inspected or evaluated. Instead prefer:
- If you want to print a file set, use the `lib.fileset.trace` or `lib.fileset.pretty` function.
- If you want to check file sets for equality, use the `lib.fileset.equals` function.»; _base = /home/user/my/project; _tree = { ... }; _type = "fileset"; }
```

As you can see from the error message, we can't just print a file set directly. Instead let's use the [`trace`](#function-library-lib.fileset.trace) function as suggested:

```nix
nix-repl> trace {} (coerce ./Makefile) null
trace: /home/user/my/project
trace: - Makefile (regular)
null
```

From now on we'll use this simplified presentation of file set expressions and their resulting values:
```nix
coerce ./Makefile
```
```
/home/user/my/project
- Makefile (regular)
```

For convenience, all file set operations implicitly call [`coerce`](#function-library-lib.fileset.coerce) on arguments that are expected to be file sets, allowing us to simplify it to just:

```nix
# Implicit coerce when passing to `trace`
./Makefile
```
```
/home/user/my/project
- Makefile (regular)
```

Files need to exist, otherwise an error is thrown:
```nix
./non-existent
```
```
error: lib.fileset.trace: Expected second argument "/home/user/my/project/non-existent" to be a path that exists, but it doesn't.
```

File sets can be composed using the functions [`union`](#function-library-lib.fileset.union) (and the list-based equivalent [`unions`](#function-library-lib.fileset.unions)), [`intersect`](#function-library-lib.fileset.intersect) (and the list-based equivalent [`intersects`](#function-library-lib.fileset.intersects)) and [`difference`](#function-library-lib.fileset.difference), the most useful of which are [`unions`](#function-library-lib.fileset.unions) and [`difference`](#function-library-lib.fileset.difference):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising to me that there are two different versions of each function, and the list versions end in "s". It made me wonder where the multiple unions were coming from, because I'd think about "the union of three directories", rather than that being two different unions in a fold, which is more of an implementation detail.

Copy link
Member Author

@infinisil infinisil May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted this from Haskell's Data.Set functions, which also features union and unions.

We could also read unions [ a b c ] as union a (union b c), which then does have multiple unions, though that's a bit of a stretch.

Do you perhaps have any suggestions for better names? unions is probably one of the most useful functions, so I like how it's currently fairly short.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unions is probably one of the most useful functions

Agreed on this, but is union (no s) useful? Maybe we just need the list versions, and then we could call them union, intersect, etc? In Nix we're a lot less likely to use them as arguments to higher-order functions like fold than might be the case in Haskell, where functions that operate on exactly two arguments might be more useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined that two-argument functions would still be preferred for use cases like "I have a file set, intersect that with only files in ./lib", which would be e.g.

intersect ./lib (fileFilter (file: file.ext == "nix") ./.)

But I'm just realizing that the list-based one isn't much worse:

intersects [ ./lib (fileFilter (file: file.ext == "nix") ./.) ]

Though it does require adding a ] at the end, which isn't the case for the two-argument function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have both ways for convenience, and the s for lists is not the worst. If we can come up with a more intuitive naming for non-Haskellers that would be great.


```nix
# The file set containing the files from all list elements
unions [
./Makefile
./src
]
```
```
/home/user/my/project
- Makefile (regular)
- src (recursive directory)
```

```nix
# All files in ./. except ./Makefile
difference
./.
./Makefile
```
```
/home/user/my/project
- README.md (regular)
- src (recursive directory)
```

Another important function is [`fileFilter`](#function-library-lib.fileset.fileFilter), which filters out files based on a predicate function:
```nix
# Filter for C files contained in ./.
fileFilter
(file: file.ext == "c")
./.
```
```
/home/user/my/project
- src
- main.c (regular)
```

File sets can be added to the Nix store using the [`toSource`](#function-library-lib.fileset.toSource) function. This function returns a string-coercible value via `outPath`, meaning it can be used directly as directory in `src` or other uses.
```nix
nix-repl> toSource {
root = ./.;
fileset = union ./Makefile ./src;
}
{
# ...
origSrc = /home/user/my/project;
outPath = "/nix/store/4p6kpi1znyvih3qjzrzcwbh9sx1qdjpj-source";
}

$ cd /nix/store/4p6kpi1znyvih3qjzrzcwbh9sx1qdjpj-source

$ find .
.
./src
./src/main.c
./src/main.h
./Makefile
```

We can use this to declare the source of a derivation:
```nix
# default.nix
with import <nixpkgs> {};
stdenv.mkDerivation {
name = "my-project";
src = lib.fileset.toSource {
root = ./.;
fileset = lib.fileset.traceVal {} (lib.fileset.unions [
./Makefile
./src
]);
};
dontBuild = true;
installPhase = ''
find . > $out
'';
}
```

```
$ nix-build
trace: /home/user/my/project
trace: - Makefile (regular)
trace: - src (recursive directory)
/nix/store/zz7b9zndh6575kagkdy9277zi9dmhz5f-my-project

$ cat result
.
./Makefile
./src
./src/main.c
./src/main.h
```

Sometimes we also want to make files outside the current `root` accessible. We can do this by setting the `root` to higher up:
```nix
lib.fileset.toSource {
root = ../.;
fileset = lib.fileset.unions [
./Makefile
./src
../utils.nix
];
};
```

However, we notice that the resulting file structure in the build directory changed:
```
$ nix-build && cat result
.
./utils.nix
./foo
./foo/src
./foo/src/main.c
./foo/src/main.h
./foo/Makefile
```

In order to prevent this we can use `srcWorkDir` to specify the local directory to start the build from:
```nix
# default.nix
with import <nixpkgs> {};
stdenv.mkDerivation {
name = "my-project";
src = lib.fileset.toSource {
root = ../.;
fileset = lib.fileset.unions [
./Makefile
./src
../utils.nix
];
};
# Make sure the build starts in ./.
srcWorkDir = ./.;

dontBuild = true;
installPhase = ''
find . > $out
echo "Utils: $(cat ../utils.nix)" >> $out
'';
}
```

```
$ nix-build && cat result
.
./Makefile
./src
./src/main.h
./src/main.c
Utils: # These are utils!
```

However for more convenience there's integration of file set functionality into `stdenv.mkDerivation` using the `srcFileset` attribute, which then doesn't require setting `root` anymore:

```
# default.nix
with import <nixpkgs> {};
stdenv.mkDerivation {
name = "my-project";
srcFileset = lib.fileset.unions [
./Makefile
./src
../utils.nix
];
srcWorkDir = ./.;

dontBuild = true;
installPhase = ''
find . > $out
echo "Utils: $(cat ../utils.nix)" >> $out
'';
}
```

This covers the basics of almost all functions available, see the full reference [here](#sec-functions-library-fileset).
1 change: 1 addition & 0 deletions lib/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ let
# Eval-time filesystem handling
path = callLibs ./path;
filesystem = callLibs ./filesystem.nix;
fileset = callLibs ./fileset.nix;
sources = callLibs ./sources.nix;

# back-compat aliases
Expand Down
Loading