-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] File set combinators #222981
[WIP] File set combinators #222981
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already got some quick preliminary (generally positive) feedback by @roberth in PMs and meetings.
To move this forward I decided to start writing documentation, even if it's not fully implemented, to get a sense of what the interface should look like. I'm creating comments to highlight some key parts of that interface, but it would be great to get some rough feedback on the other parts of the interface documentation too.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-47/27387/1 |
I've now done a major rewrite of this and am super hype now, this is turning into something really cool. Weekend time now though, I'll continue next week. See the PR description for how I plan to proceed. |
doc/functions/fileset.section.md
Outdated
} | ||
``` | ||
|
||
To see everything you can do with file sets, check out the [reference documentation](#sec-functions-library-fileset). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To see everything you can do with file sets, check out the [reference documentation](#sec-functions-library-fileset). | |
To see everything you can do with file sets, see the [library documentation](#sec-functions-library-fileset). |
We're already in reference documentation; just an introductory section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this in a tutorial-style, with the expectation that it would be moved to (or just rendered at) nix.dev at some point (that's where the doc team is moving tutorial docs towards).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While all text can be learned from, this is not a "learning experience", certainly not according to diataxis. And I think that's fine. File sets aren't just functions; the type needs reference documentation too, and it's a good place for a basic introduction to the concept.
In general, tutorials are the weakest part of documentation, the most misunderstood and the most difficult to do well. Most software projects have poor - or non-existent - tutorials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well then I clearly don't know how to write a tutorial 😅, please take another look at the document now though, I wouldn't really count that as part of the reference.
b1971f8
to
14d5fd1
Compare
error: lib.fileset.trace: Expected second argument "/home/user/my/project/non-existent" to be a path that exists, but it doesn't. | ||
``` | ||
|
||
File sets can be composed using the functions [`union`](#function-library-lib.fileset.union) (and the list-based equivalent [`unions`](#function-library-lib.fileset.unions)), [`intersect`](#function-library-lib.fileset.intersect) (and the list-based equivalent [`intersects`](#function-library-lib.fileset.intersects)) and [`difference`](#function-library-lib.fileset.difference), the most useful of which are [`unions`](#function-library-lib.fileset.unions) and [`difference`](#function-library-lib.fileset.difference): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surprising to me that there are two different versions of each function, and the list versions end in "s". It made me wonder where the multiple unions were coming from, because I'd think about "the union of three directories", rather than that being two different unions in a fold, which is more of an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adapted this from Haskell's Data.Set
functions, which also features union
and unions
.
We could also read unions [ a b c ]
as union a (union b c)
, which then does have multiple union
s, though that's a bit of a stretch.
Do you perhaps have any suggestions for better names? unions
is probably one of the most useful functions, so I like how it's currently fairly short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unions
is probably one of the most useful functions
Agreed on this, but is union
(no s) useful? Maybe we just need the list versions, and then we could call them union
, intersect
, etc? In Nix we're a lot less likely to use them as arguments to higher-order functions like fold
than might be the case in Haskell, where functions that operate on exactly two arguments might be more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagined that two-argument functions would still be preferred for use cases like "I have a file set, intersect that with only files in ./lib", which would be e.g.
intersect ./lib (fileFilter (file: file.ext == "nix") ./.)
But I'm just realizing that the list-based one isn't much worse:
intersects [ ./lib (fileFilter (file: file.ext == "nix") ./.) ]
Though it does require adding a ]
at the end, which isn't the case for the two-argument function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have both ways for convenience, and the s
for lists is not the worst. If we can come up with a more intuitive naming for non-Haskellers that would be great.
Nix / It's just a thought; maybe this goes nowhere, but if we could do better at trees, perhaps we could have better file sets, or at least a better file set implementation internally. |
_base = base; | ||
_tree = tree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a call with @roberth where we concluded that this would be even better to make sure people know these are internal:
_base = base; | |
_tree = tree; | |
_internalBase = base; | |
_internalTree = tree; |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/zurich-23-05-zhf-hackathon-and-workshop-report/29093/1 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117/1 |
These 4 separate path-library related PR's are needed before a basic version of file set combinators can be merged, they are all up-to-date with all feedback addressed, please review: |
I opened a first draft PR for just |
|
||
# Coerce and normalise the bases of multiple file set values passed to user-facing functions | ||
# Type: String -> [ { context :: String, value :: Any } ] -> { commonBase :: Path, trees :: [ <tree> ] } | ||
_normaliseBase = function: list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own future reference for once I get to implementing the combinators (in particular intersect
):
This function here should only be used for union
, not for difference
or intersect
. In particular:
- For
union
, the resulting base path should be the common prefix of all the inputs base path, e.g.union /foo /foo/bar
should give/foo
as the base, this is what this function does - For
intersect
, the resulting base path should be the path with the most components of the input bases, e.g.intersect /foo /foo/bar
should give/foo/bar
as the base. If the paths don't overlap, an empty set is returned.- TODO: What base should an empty result have? E.g.
intersect /foo /bar
, should that give an error, pick/foo
or/bar
arbitrarily, or use/
? Maybe the notion of a special empty file set value should be introduced.
- TODO: What base should an empty result have? E.g.
- For
difference
, the resulting base is the same as the base from the first argument, e.g.difference /foo /
has base/foo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This not being done leads to this error when it should work just fine:
nix-repl> fileset.toSource { root = ./lib; fileset = fileset.intersect ./lib ./.; }
error: lib.fileset.toSource: Expected attribute `fileset` to not be influenceable by any paths outside `root`, but `lib.fileset.getInfluenceBase fileset` "/home/tweagysil/antithesis/nixpkgs" is outside `root`.
Is there a lightweight way to start using these right now? Or would I need to pull an entire copy of nixpkgs from this branch? |
you could do a shallow copy (--depth 1) from this branch. That shouldn't be too big |
thanks! So the changes from the other PR are included here? |
@adrian-gierakowski @mohe2015 I think a shallow clone would be about the same as a GitHub tarball download, I don't think there's a better way for now. In stable Nix it would look like this: fetchTarball "https://github.com/tweag/nixpkgs/tarball/file-sets"
# Or replace `file-sets` with a commit to pin it With niv it would be
And with experimental Flakes: {
inputs.nixpgksFilesets.url = "github:tweag/nixpkgs/file-sets";
} And if you give it a try I'd be happy to hear any feedback you have, good or bad :D |
You could fetch just [nixpkgs]$ gh pr checkout 222981
[nixpkgs]$ git rev-parse HEAD:lib
1bdcd7fc8a6a40b2e805bad759b36e64e911036b
[nixpkgs]$ curl -L https://github.com/NixOS/nixpkgs/archive/1bdcd7fc8a6a40b2e805bad759b36e64e911036b.tar.gz | tar -tz | grep fileset
nixpkgs-1bdcd7fc8a6a40b2e805bad759b36e64e911036b/fileset.nix let
inherit (import (builtins.fetchTarball "https://github.com/NixOS/nixpkgs/archive/1bdcd7fc8a6a40b2e805bad759b36e64e911036b.tar.gz"))
fileset;
in
... |
Awesome, thanks @roberth! |
|
||
The predicate is called with an attribute set containing these attributes: | ||
|
||
- `name`: The filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% clear. Is this like baseNameOf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup it is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw note that this is the draft PR. The one ready for review is #245623
Which doesn't compose with the existing source library except through explicit conversions, and the conversion isn't isomorphic. It's not really the composability that improves (comparatively). Rather the meaning of the operations is more set-like, to a large degree. Instead of two somewhat unintuitive attributes, we now have just one - and I'll note that in both designs we do get good behavior almost all the time. |
I'll close this, because this is now mostly implemented! See #266356 for further updates and details :) |
This PR rethinks @roberth's original source combinators into a more composable datatype: Sets of files. Nothing is decided yet, but I'm really amazed by how well this is working.
The abstraction here allows efficient and lazy operations on a set of files. Typical set operations are supported, including union, intersection, difference and filtering.
This work is sponsored by Antithesis ✨
Update: This has now been mostly incrementally merged! See #266356 for more info.
Plan
Iterate the above until everybody is happy with the designDespite reaching out to the community, not much feedback was given. This does mean that there's no complaints about it either though. Because of this, I'll move towards implementing it, see [WIP] File set combinators #222981 (comment)