Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] lib.sources.predicateFilter: init #221361

Closed
wants to merge 4 commits into from

Conversation

infinisil
Copy link
Member

@infinisil infinisil commented Mar 15, 2023

Description of changes

WIP! This function may not be needed in the end because a higher-level abstraction might be better suited.

This PR adds lib.sources.predicateFilter, which is like builtins.{path,filter} or lib.sources.cleanSourceWith, except that it calls the filtering function with relative paths, specifically subpaths, as introduced in #205190. It also improves on the interface a bit by not using ordered arguments. The useful lib.path.subpath.hasPrefix function is also introduced here, relating to #210423. This is more generally related to #210426.

# test.nix
let lib = import <nixpkgs/lib>;
in lib.sources.predicateFilter {
  src = <nixpkgs>;
  predicate = { subpath, ... }:
    let
      included =
        # Include lib/systems recursively
        ## Include all parents of the target directory (but not recursively)
        lib.path.subpath.hasPrefix subpath "lib/systems"
        ## Include all children of the target directory (recursively)
        || lib.path.subpath.hasPrefix "lib/systems" subpath

        # Include a single file from nixos/modules/system/boot/loader
        ## Inclued all parents of the target directory (but not recursively)
        || lib.path.subpath.hasPrefix subpath "nixos/modules/system/boot/loader"
        ## Include the single file that we want (we should have lib.path.subpath.equals instead though)
        || subpath == "./nixos/modules/system/boot/loader/efi.nix";
    in
      if included
      then builtins.trace "Including ${subpath}" true
      else false;
}
$ nix-instantiate --eval --read-write-mode test.nix -I nixpkgs=$PWD -A outPath
trace: Including ./lib
trace: Including ./lib/systems
trace: Including ./lib/systems/architectures.nix
trace: Including ./lib/systems/default.nix
trace: Including ./lib/systems/doubles.nix
trace: Including ./lib/systems/examples.nix
trace: Including ./lib/systems/flake-systems.nix
trace: Including ./lib/systems/inspect.nix
trace: Including ./lib/systems/parse.nix
trace: Including ./lib/systems/platforms.nix
trace: Including ./nixos
trace: Including ./nixos/modules
trace: Including ./nixos/modules/system
trace: Including ./nixos/modules/system/boot
trace: Including ./nixos/modules/system/boot/loader
trace: Including ./nixos/modules/system/boot/loader/efi.nix
"/nix/store/1n07g9i2ikgdypizxaj9m56m8f18dav2-source"

Most notably, as is also the case with all the lib.path functions, this function works exactly the same way with the lazy trees PR, despite it causing breakages for other source filters.

This work is sponsored by Antithesis

Things done

@infinisil infinisil marked this pull request as draft March 15, 2023 18:34
@infinisil infinisil mentioned this pull request Mar 15, 2023
13 tasks
A filter function that works on subpaths instead of absolute paths, also
works for lazy paths
lib/sources.nix Outdated
@@ -270,6 +270,19 @@ let
outPath = builtins.path { inherit filter name; path = origSrc; };
};

predicateFilter = { name ? "source", src, predicate }:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a move in the right direction. By repeating each feature in the argument list, we're creating the square of the simpler api proposed in

I strongly suggest going with the two-argument filter and setName functions. They do not need extensibility because they have a single, well-defined responsibility. Any new responsibilities can be covered by new functions instead. Let's keep it simple.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zooming back out it's actually not clear at all that a relative path would be a better predicate input than a path value. It breaks the following:

filter (path: type: path != ./src/README.md) ./src

and it creates ambiguity about what this means

filter (path: type: path != "./src/README.md") ./src

and it highly complicates the transition from

filter f ./.

to

filter f (extend ../globals.mk ./.)

The amazing benefit of path expressions is that you don't need to think about any evaluation whatsoever to determine which file is referenced.
By using relative path strings you lose this incredibly powerful rule.

lib/sources.nix Outdated
Comment on lines 275 to 276
isFiltered = src ? _isLibCleanSourceWith;
origSrc = if isFiltered then src.origSrc else src;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates toSourceAttributes. Please reuse that.

lib/sources.nix Outdated
Comment on lines 280 to 283
filter = absolutePath: type:
assert lib.isString absolutePath;
let subpath = removeBase (lib.substring 1 (-1) absolutePath);
in predicate { inherit type subpath; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alternate implementation of the filter function blocks composability with the other functions. Both styles of filter function can be supported across the api, by extending the normalized representation that is returned by toSourceAttributes. Instead of the current filter attribute, it could have three attributes: absFilter, relFilter and preferRelFilter.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Mar 15, 2023
@infinisil
Copy link
Member Author

infinisil commented Mar 15, 2023

@roberth

The main consideration here is that builtins.path's filter (and everything else with the same interface like lib.cleanSourceWith) is fundamentally problematic because it's called with string values containing absolute paths:

  • The absolute path changes depending where your project lives, which can then influence a poorly written filter, e.g. name: type: hasInfix "src" name includes all files if you cloned the repository to ~/src/theRepo
    • Lazy trees "fixes" this by just stripping ~/src/theRepo away, leaving absolute paths (still string values) that are actually relative paths, like "/README.md". But this introduces the new problem of inconsistencies between filtering normal paths and lazy paths
  • Comparing string values to paths always returns false, leading to unexpected results, e.g. name: type: name == ./default.nix doesn't actually include default.nix
    • And with lazy trees, the normal workaround of using toString doesn't work anymore

The function introduced here fixes most of those problems (the last one only partially) by using subpath strings for filtering instead, establishing a cleaner foundation to base source filtering on. The main reason for this is really the lazy trees mess, which is entirely abstracted away with this.

But also note that I don't think this interface needs to be directly exposed to the user in most cases (maybe all?), instead a source combinators API (supporting path expressions) should be useful enough to (almost?) never have to write a custom filter function or subpaths.

@infinisil
Copy link
Member Author

infinisil commented Mar 15, 2023

Zooming back out it's actually not clear at all that a relative path would be a better predicate input than a path value. It breaks the following:

filter (path: type: path != ./src/README.md) ./src

This doesn't actually work btw, you'd need toString (though that also breaks in lazy trees) as mentioned in above message too :)

@roberth
Copy link
Member

roberth commented Mar 16, 2023

The main consideration here is that builtins.path's filter (and everything else with the same interface like lib.cleanSourceWith) is fundamentally problematic because it's called with string values containing absolute paths:

I'm not arguing for absolute paths strings. I'm arguing for path values.

Nonetheless, functions that work with either representation can be made compatible, as long as they operate on a shared representation that is complete enough. That's not to say that we shouldn't design primarily for path values though. Path values are the most useful representation, so if any compromises would need to be made, let's make them in favor of path values.

builtins.path

This should be replaced by a new primop, e.g. builtins.filterTree which is at least a (Path -> String -> Bool) -> Path -> Path.
Doesn't Eelco's proof of concept change path to behave like that?
In Nixpkgs, we can emulate this behavior for current and older Nix.

cleanSourceWith

This can stick around for compatibility, but I agree that it is not good.

by using subpath strings for filtering instead, establishing a cleaner foundation to base source filtering on.

"Let's use a low level interface and implement it ourselves" feels like a safe choice because you have "full control" or you can "implement it exactly the way you like", but to buy into that, you need to ignore that path values are already a very good abstraction, and path values already exist and users must interact with them regardless of the choices we make in this library.
By using a path type that doesn't match the path literals' type, your creating a worse user experience.

The main reason for this is really the lazy trees mess, which is entirely abstracted away with this.

I will keep reminding Eelco that breaking path semantics is not ok, and we can all make sure that lazy trees keeps offering the primitives we need for the filtering library.
We can't let unnecessary sandbagging creep into the library design. It creates unnecessary "requirements" that are really just extrinsic complexity. The Nix team is not going to let that broken toString thing get merged, or the other breaking changes for that matter. Also I don't think we need toString anyway, because we have baseNameOf and dirOf.
If lazy trees blocks anything, I'm 99.5% confident that it's something that must be fixed in either lazy trees or in the source filtering library.

But also note that I don't think this interface needs to be directly exposed to the user in most cases (maybe all?),

It is though. If it exists, people are going to use it, and it when we have three styles of interface, people will hate all of them, because they don't know what to expect where. Many users won't touch source filters for months on end, so don't count on anyone memorizing a map of the source filtering landscape. It should be simple and consistent.

instead a source combinators API (supporting path expressions) should be useful enough to (almost?) never have to write a custom filter function or subpaths.

That is the goal, but I wouldn't predicate the design of the library on users using it in the ideal manner, if at all possible.

filter (path: type: path != ./src/README.md) ./src

This doesn't actually work btw, you'd need toString

Works just fine when path is a path value, as the name suggests. No toString required.

@infinisil
Copy link
Member Author

infinisil commented Mar 17, 2023

But also note that I don't think this interface needs to be directly exposed to the user in most cases (maybe all?),

It is though.

Yeah, I guess I shouldn't have implied that it will be exposed. So far I am convinced that we don't actually need to expose it. However, you got me thinking..

Path values are the most useful representation, so if any compromises would need to be made, let's make them in favor of path values.

Are you sure that in this case they actually are? They do have disadvantages compared to subpaths if you want to use them for source filtering:

  1. Moving the .nix file around that does the source filtering requires updating all the absolute path expressions within it. Subpath strings wouldn't have to be updated.
  2. You practically can't use path values for filtering non-local repository sources, e.g. something from NIX_PATH, or if we wanted the same interface to filter fetched sources.
  3. It's easy to accidentally import paths into the store while you're trying to filter them out, e.g. path: type: hasSuffix ".nix" path would import path into the store!

This should be replaced by a new primop, e.g. builtins.filterTree which is at least a (Path -> String -> Bool) -> Path -> Path.
Doesn't Eelco's proof of concept change path to behave like that?

If I get what you mean, no it doesn't. All the filtering functions (also TIL there's a new one in lazy trees, builtins.filterPath!) use paths in strings, see expanded details:

$ nix run github:edolstra/nix/cdb946e9f07766d54b40e4380e7706f77eddea61 -- repl
warning: future versions of Nix will require using `--file` to load a file
Welcome to Nix 2.15.0pre20230313_cdb946e. Type :? for help.

nix-repl> filter = path: type: builtins.trace "${builtins.typeOf path}: ${toString path}" true

nix-repl> builtins.path { path = ./.; inherit filter; }
trace: string: /home/tweagysil/src/nixpkgs/.editorconfig
trace: string: /home/tweagysil/src/nixpkgs/.git
trace: string: /home/tweagysil/src/nixpkgs/.git/COMMIT_EDITMSG
[...]

nix-repl> "${builtins.filterPath { path = ./.; inherit filter; }}"
trace: string: /
trace: string: /.editorconfig
trace: string: /.git
trace: string: /.git-blame-ignore-revs
[...]

nix-repl> builtins.path { path = builtins.fetchTree { type = "git"; url = ./.; }; inherit filter; }
warning: Git tree '/home/tweagysil/src/nixpkgs' is dirty
trace: string: /.editorconfig
trace: string: /.git-blame-ignore-revs
trace: string: /.gitattributes
[...]

nix-repl> "${builtins.filterPath { path = builtins.fetchTree { type = "git"; url = ./.; }; inherit filter; }}"
warning: Git tree '/home/tweagysil/src/nixpkgs' is dirty
trace: string: /
trace: string: /.editorconfig
trace: string: /.git-blame-ignore-revs
trace: string: /.gitattributes
[...]
filter (path: type: path != ./src/README.md) ./src

This doesn't actually work btw, you'd need toString

Works just fine when path is a path value, as the name suggests. No toString required.

At least that's not what any of the current filtering functions do, none of them work if you just try to do path != ./src/README.md, because they are based on absolute paths in strings, not paths. But yes, if that filter function is a new API, then it can be made to work that way.

@infinisil
Copy link
Member Author

infinisil commented Mar 17, 2023

I am just realizing that lazy tree's builtins.filterPath interface is almost the function I'm proposing here. Comparing:

  • builtins.path on normal paths, uses the entire absolute path (but in a string)

    nix-repl> builtins.path { path = ./lib; filter = path: type: builtins.trace path true; }
    trace: /home/tweagysil/antithesis/nixpkgs/lib/ascii-table.nix
    trace: /home/tweagysil/antithesis/nixpkgs/lib/asserts.nix
    
  • builtins.path on lazy trees, uses the path relative to the lazy root:

    nix-repl> builtins.path { path = builtins.fetchTree { type = "git"; url = ./.; } + "/lib"; filter = path: type: builtins.trace path true; } 
    trace: /lib/ascii-table.nix
    trace: /lib/asserts.nix
    
  • builtins.filterPath on any argument (works the same on normal paths and lazy paths), uses the relative path to the filter root:

    nix-repl> "${builtins.filterPath { path = ./lib; filter = path: type: builtins.trace path true; }}"                                         
    trace: /
    trace: /ascii-table.nix
    trace: /asserts.nix
    
  • And this function, which does the same thing (also works on normal and lazy paths!), but actually presents it as a relative path (and doesn't allow you to filter out /, I think that's another bug in lazy trees):

    nix-repl> lib.sources.predicateFilter { src = ./lib; predicate = { subpath, ... }: builtins.trace subpath true; }
    trace: ./ascii-table.nix
    trace: ./asserts.nix
    

@infinisil
Copy link
Member Author

I implemented one path-based and one subpath-based union implementation now, just to try them out:

nix-repl> lib.sources.unionPath ./. [ ./lib ./nixos/modules ] 
{ _isLibCleanSourceWith = true; filter = «lambda @ /home/tweagysil/antithesis/nixpkgs/lib/sources.nix:280:16»; name = "source"; origSrc = /home/tweagysil/antithesis/nixpkgs; outPath = "/nix/store/mbwzfy61xix03y7pmwz0rrwjzpg5v6sq-source"; }

nix-repl> lib.sources.unionSubpath ./. [ "lib" "nixos/modules" ]
{ _isLibCleanSourceWith = true; filter = «lambda @ /home/tweagysil/antithesis/nixpkgs/lib/sources.nix:291:16»; name = "source"; origSrc = /home/tweagysil/antithesis/nixpkgs; outPath = "/nix/store/mbwzfy61xix03y7pmwz0rrwjzpg5v6sq-source"; }

Notably with such a unionPath function, disadvantage 3. from #221361 (comment) is not relevant (since you don't need to really do anything with the paths yourself, but the other two disadvantages still apply.

@infinisil
Copy link
Member Author

But I'm just realizing that a unionSubpath ./. [ "lib" "nixos/modules" ] isn't actually composable, because it doesn't take source-like's as inputs, but instead just strings. While unionPath in this implementation also doesn't compose, it can be made to work, and it would fit in well (see the extend in source combinators)

@roberth
Copy link
Member

roberth commented Mar 18, 2023

  • builtins.filterPath on any argument (works the same on normal paths and lazy paths), uses the relative path to the filter root:

All source filtering primops have been wrapped so far, and the focus of that branch is on the fetchers, not necessarily on source filtering. I wouldn't count too much an that function being a deeply thought out design.

(and doesn't allow you to filter out /, I think that's another bug in lazy trees):

Checking / makes it more lawful. The filter is consulted for every path, regardless of where the new root lies.

Suppose you have a filter f that rejects ./doc, it'd be strange for the same filter to accept ./doc if the root of the source was restricted to ./doc. Should files suddenly appear, despite using the same filter function?

let
  build = f: dir: doIt {
    doc = filter f (dir + "/doc");
    src = filter f (dir + "/src");
  };

in
{
  pkg = build (path: type: true) ./pkg;
  pkgWithoutDoc = build (path: type: path != ./doc) ./pkg;
}
  • Moving the .nix file around that does the source filtering requires updating all the absolute path expressions within it. Subpath strings wouldn't have to be updated.

Conversely, if you change the root of the source, perhaps because you need to include something from a parent directory, you have to update the subpath strings.

  • You practically can't use path values for filtering non-local repository sources, e.g. something from NIX_PATH, or if we wanted the same interface to filter fetched sources.

This is unnecessary IFD anyway. Good riddance.

3. It's easy to accidentally import paths into the store while you're trying to filter them out, e.g. path: type: hasSuffix ".nix" path would import path into the store!

hasSuffix shouldn't do that.

@infinisil
Copy link
Member Author

Checking / makes it more lawful.

Interesting, I haven't considered that. At least for manual filter writing this is more cumbersome, since e.g. just path: type: path == ./src wouldn't work anymore, because / would be filtered out. But it does seem more lawful, I'll consider this if we actually need to e.g. mirror the Nix behavior in Nixpkgs.

  • Moving the .nix file around that does the source filtering requires updating all the absolute path expressions within it. Subpath strings wouldn't have to be updated.

Conversely, if you change the root of the source, perhaps because you need to include something from a parent directory, you have to update the subpath strings.

Good point, I'd say these two use-cases are needed about equally often too.

  • You practically can't use path values for filtering non-local repository sources, e.g. something from NIX_PATH, or if we wanted the same interface to filter fetched sources.

This is unnecessary IFD anyway. Good riddance.

I wasn't thinking of IFD, but instead of a pkgs.filterSource that works based on derivations, which could be implemented with an interface like pkgs.filterSource { src = pkgs.fetchFromGitHub { ... }; include = [ "src" "Makefile" ]; }. But that's just a potential future thing, not very relevant.

  1. It's easy to accidentally import paths into the store while you're trying to filter them out, e.g. path: type: hasSuffix ".nix" path would import path into the store!

hasSuffix shouldn't do that.

At least #221204 doesn't prevent that case yet, but yes it probably shouldn't do that.

I'll think about the interface more with the points you made.

@infinisil
Copy link
Member Author

Closing as this won't be necessary, see #222981

@infinisil infinisil closed this May 19, 2023
@infinisil infinisil deleted the lib.sources.relFilter branch May 19, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants