Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Recursive Globs #1866

Closed
rgrinberg opened this issue Feb 25, 2019 · 17 comments
Closed

[RFC] Recursive Globs #1866

rgrinberg opened this issue Feb 25, 2019 · 17 comments

Comments

@rgrinberg
Copy link
Member

This feature has been mentioned in passing a few other times in the bug tracker, so I'd like to create an official issue to track it.

Problem

The glob language is quite useful when describing file sets, but unfortunately it encourages a "flat" dependency structure in your project. Sometimes, a glob has to span directories recursively.

Solution

The solution is to add **/ pattern. This pattern is interpreted as matching zero or more path components. So for example, we'd be able to depend on all coq files recursively using this pattern:

(deps (coq (glob_files theories/**/*.v))

Cc'ing @ejgallego as he is currently using find/bash to workaround the lack of this. I've found this to be useful in a few other situations as well.

@rgrinberg
Copy link
Member Author

To implement this feature efficiently, it seems like we need to get the subpaths efficiently. To do that, we'll need to change Build_system.t's dirs field to something similar to File_tree.t.

@diml does that sound right to you? I feel that might have some negative performance implications as currently it's just a hash table look up.

@rgrinberg
Copy link
Member Author

Actually, it seems like it will not be necessary after all. We'll just have to call targets_of recursively and find the necessary directories to descend into.

Some test cases that we should cover are:

depend on all .v files one level deep:

(glob_files foo/*/*.v)

depend on all .v files recursively:

(glob_files foo/**/*.v)

@ghost
Copy link

ghost commented Feb 26, 2019

It all depends on the semantic you want. glob_files matches generated files, should recursive glob descend into generated directories as well?

If not, then we can simply use the file tree to find out what sub-directories to descend into. If yes, then that's more complicated. Currently, the only generated directories are the dot-directories generated by dune, so it doesn't feel like the user wants to include these in general.

At the same time, if we want to describe the behaviour of find . -name *.cmi, then we need to include generated directories.

@rgrinberg
Copy link
Member Author

The behavior that is consistent with globs is to match generated files so I think that we should make it work the same.

Although I definitely agree that we should not include dune's internal dirs in any of these globs. Really, I think users shouldn't even be allowed to write them as dependencies. It's just that we don't have workarounds for some of the use cases.

@rgrinberg
Copy link
Member Author

@ejgallego could you weigh in on the desired behavior? I'd like to confirm if you need the recursive glob to capture targets.

@ejgallego
Copy link
Collaborator

@ejgallego could you weigh in on the desired behavior? I'd like to confirm if you need the recursive glob to capture targets.

So far we don't need to capture targets in any of our cases, but I dunno, it could well be.

In fact, our need for glob patterns is fairly limited, it only appears for lack of proper Dune support for coqdoc rules, and in some other case in the Coq codebase we could likely eliminate that using the plugins.

But that latter case is interesting, it builds a .v file that will actually Require all other .v files in a lib [as a sanity check as Requiring in Coq is effecful and can fail!]

But yeah we should really invariant in this case to whether globs include the target or not, but note indeed that if we do a setup where the generated .v file is on a bad place, globs including targets would mean a cyclic dependency [if I understand "target"] properly.

@rgrinberg
Copy link
Member Author

Thinking about this more, I think it would be quite unintuitive for users if recursive globs don't include targets while normal globs do. So if it ends up being too painful to implement recursive globs which include targets, we should just make them error out if they include a target with a message like: "this feature isn't supported yet".

file is on a bad place, globs including targets would mean a cyclic dependency [if I understand "target"] properly.

Indeed, but this problem isn't specific to globs and dune gives proper error messages when encountering dependency cycles (as it would in this case as well).

@rgrinberg
Copy link
Member Author

@diml I've tried implementing this feature and just realized that ** already means something else in our glob syntax. it already means:

  • .* when it's the entire glob
  • [^/.][^/]* when used outside the glob

With recursive globs, the meaning should be closer to ([^/]+/)*. That's roughly "zero or more more path components".

One option is to create a new syntax for recursive globs. However, I don't know how to introduce a new syntax without risking breaking of existing globs. Another option is to add a rec_glob constructor where ** is interpreted differently. Do you have another suggestion/preference perhaps?

@ghost
Copy link

ghost commented Apr 1, 2020

If we want to change the syntax of globs, it'll have to be in version 3 of the language. If we want the feature before, we should indeed add a rec_glob constructor. So one way forward is:

  1. add a rec_glob constructor now
  2. if lang dune 3, remove the rec_glob constructor and change the syntax of globs

In any case, if we are about to introduce a new glob syntax we should look around to make sure we don't invent something that is very different from what other tools do. So we should look at the syntax of things such as gitignore or hgignore.

@rgrinberg
Copy link
Member Author

rgrinberg commented Apr 1, 2020

Here's documentation regarding ** for gitignore:

Two consecutive asterisks ("**") in patterns matched against full pathname may
have special meaning:

A leading "**" followed by a slash means match in all directories. For example,
"**/foo" matches file or directory "foo" anywhere, the same as pattern "foo".
"**/foo/bar" matches file or directory "bar" anywhere that is directly under
directory "foo".

A trailing "/**" matches everything inside. For example, "abc/**" matches all
files inside directory "abc", relative to the location of the .gitignore file,
with infinite depth.

A slash followed by two consecutive asterisks then a slash matches zero or more
directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.

Other consecutive asterisks are considered regular asterisks and will match
according to the previous rules.

Here's the documentation for the globstar (the option that turns on recursive globs) option in bash:

If set, the pattern ‘**’ used in a filename expansion context will match all
files and zero or more directories and subdirectories. If the pattern is
followed by a ‘/’, only directories and subdirectories match.

So I believe both of these are consistent with how I'm choosing to interpret ** in dune.

I think the correct way to go is to introduce rec_glob as you said.

@ghost
Copy link

ghost commented Apr 7, 2020

Yeah, for now we have no choice but to introduce rec_glob. One thing I'm wondering: is it something we want to keep long term, or is it better to eventually unifiy the various glob formats and have only one? I tend to lean towards the latter and so we should choose a syntax for rec_glob that will allow that. I believe the one you suggest is fine in this regard.

@hackedy
Copy link

hackedy commented Dec 3, 2020

I ran into this issue today--I have a code generation pass with a lot of dependencies that make more sense as a recursive glob than an expanded list of globbed directories, and I had to spell them all out.

@ghost ghost added this to the 3.0 milestone Feb 1, 2021
@ghost ghost mentioned this issue Feb 1, 2021
2 tasks
@rgrinberg
Copy link
Member Author

This is partially addressed by #4176. Removing the 3.0 milestone.

@rgrinberg rgrinberg removed this from the 3.0 milestone Oct 14, 2021
@Zimmi48
Copy link
Contributor

Zimmi48 commented Oct 15, 2021

Will glob_files_rec get documentation then?

@ghost
Copy link

ghost commented Oct 18, 2021

Yes, we should document it. #5012

@rgrinberg
Copy link
Member Author

@Alizter can you give us an update here?

@Alizter
Copy link
Collaborator

Alizter commented May 21, 2022

Seems to work fine, I have been doing things like:

(rule
 (alias validate)
 (deps
  (glob_files_rec ./*.vo))
 (action
  (run coqchk -R ./theories HoTT -Q contrib HoTT.Contrib %{deps} -o)))

since dune 3.0.

@rgrinberg rgrinberg added this to the 3.0.0 milestone May 21, 2022
@Alizter Alizter moved this to Todo in Coq + Dune Nov 1, 2022
@Alizter Alizter moved this from Todo to Done in Coq + Dune Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants