Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visibility rule glob tweaks for * and ** #17588

Merged
merged 6 commits into from
Nov 21, 2022

Conversation

kaos
Copy link
Member

@kaos kaos commented Nov 18, 2022

From comment thread: #17573 (comment)

Background

The visibility rules uses the fnmatch.fnmatch() library function to match paths with rule globs. These globs are modelled to resemble shell globbing, where a single * is "match all recursively". It also supports ? and [!xyz] etc for single character patterns.

Our paths are, in shell terms, always relative i.e. they never begin with a forward slash, and always starts at the project root. To have a path be "project relative" to a BUILD file we use the "current directory" marker, the period. So ./some/path is relative to the BUILD file rather than the project root.

For the visibility rules, this "relative to the BUILD file" marker follows with the rules and are resolved when a rule is evaluated. Meaning that when a target deep down in the sub tree inherits a rule with a project relative glob it will be relative to the BUILD file where the target was declared, not the BUILD file where the rule was declared.

Issue 1

In order to anchor a project relative glob without having to resort to using the full path from the project root, we need something other than the period as marker.

Borrowing from the target address syntax, where an address that omits the path and begins directly with the target name after the opening : is treated as relative to the current BUILD file, we can do the same for rule path globs. Any rule glob beginning with a single : can be treated as relative to the BUILD file where it is declared, and thus anchored and will not change when being inherited in the sub tree.

./floating/relative/glob
:/anchored/relative/glob
src/*/from/project/root/glob

Issue 2

Commonly, one want to include all paths from a certain point and the whole sub tree below it. The first idea is to simply say some/point/*, and this will include "the whole sub tree", but it fails to include some/point itself. One attempt to remedy this could be to say, OK let's backup a step and use some/point*. Again this works.. not perfectly. It does solve the previous issue but also introduces a new one, as it would include some/pointless and some/pointing/BOOM too, not good enough.

Borrowing from the target address syntax once again, we have a syntax for "all targets from here and all its children", with the :: pattern. For the example above, this could now be written as some/point:: (or eq. some/point/:: matter of style preference only) and that is the same as using two path globs: "some/point", "some/point/*".

We prefer to not introduce ** here, as that is commonly used to designate a recursive glob pattern in other tools, and would suggest that the * here is not recursive.

:/:: - Match all from here and the whole sub tree.

Resolution

Abandoning fnmatch in favour of a custom implementation allows us to use * and ** sensibly. Lifted from comment below:

There are two distinct aspects of the rule pattern globs. Anchoring syntax and globbing syntax.

We may have 4 modes of anchoring a rule pattern glob.

  1. Anchored to project root.
  2. Anchored to declaration directory. (i.e. where it is defined)
  3. Anchored to invocation directory. (i.e. where it is applied, which may be != defined due to inheritance to sub directories)
  4. Floating, not anchored to anything.

For globbing syntax, we may start simple, to just support match path segment (*) and match all recursively (**) which I believe will be sufficient for the most common cases. Can easily be expanded upon in the future as the need arise.

  1. * equiv. to regexp: [^/]* (i.e. any path segment)
  2. ** equiv. to regexp; .* (i.e. recursive match all)
    a. I suggest that a recursive pattern when used as /** have the slash be optional, so that /foo/** also matches just /foo. equiv. regexp: /foo(/.*)?$

For the anchoring syntax, I have the following proposal, that a glob pattern is anchored based on the prefix:

  1. // - anchored to project root (same syntax as for the target address)
  2. / - anchored to the declaration directory.
  3. . - anchored to the invocation directory.
  4. `` - floating. (i.e. none of the above)

All globs have an implicit $ at the end of the pattern, so if anchored, must match the whole path when being applied.

Example globs:

# Allow the whole subtree under `src/python/project/`
//src/python/project/**

# Deny all directories two levels below the `3rdparty` tree. 
!//3rdparty/*/*

# Warn if there's a dependency on anything from a `docs/_priv` tree.
# e.g. warn on `src/project/a/docs/_priv/html/index.html` 
# regardless of where the rule was declared and where it was invoked from.
?docs/_priv/**

# Allow all sibling `pub/` files (i.e. when `pub/` is next to the targets BUILD file)
./pub/*

# Allow all files in the current subtree to depend on each other
/**

@kaos kaos added the category:internal CI, fixes for not-yet-released features, etc. label Nov 18, 2022
@benjyw
Copy link
Contributor

benjyw commented Nov 18, 2022

Can you add in the description what those symbols mean? And why we can't use */**?

@benjyw
Copy link
Contributor

benjyw commented Nov 19, 2022

Since */** already have meaning that is well understood (e.g., in gitignore, which also supports !), maybe we should just make those work? fnmatch is an implementation detail, we don't have to use it...

We can still use :/ to mean "directory the rule is defined in, of course.

@kaos
Copy link
Member Author

kaos commented Nov 19, 2022

Since */** already have meaning that is well understood (e.g., in gitignore, which also supports !), maybe we should just make those work? fnmatch is an implementation detail, we don't have to use it...

True. Will look into what there is we can use for this.

We can still use :/ to mean "directory the rule is defined in, of course.

👍🏽

@kaos
Copy link
Member Author

kaos commented Nov 19, 2022

@benjyw are you OK with us introducing a new 3rd party dep for this, rather than rolling our own?
Looking at this one in particular:
PyPi: https://pypi.org/project/pathspec/
GitHub: https://github.com/cpburnz/python-pathspec
Docs: https://python-path-specification.readthedocs.io/en/latest/index.html

@kaos
Copy link
Member Author

kaos commented Nov 19, 2022

@benjyw Looking at implementing the gitignore style path globs, I found questionable results from the above lib. Trying to understand if it was a glob feature or a bug, reading this: https://git-scm.com/docs/gitignore#_pattern_format makes me squirmish. It's too much nuances (if's and but's) for being applied as a strict framework for our visibility rules, imo. i.e. too easy to make a mistake.

If the restrictions of the fnmatch library doesn't fit, I actually propose to roll our own glob matching with some basic clearly defined rules.

I'll push what I have thus far, which supports a hybrid implementation where you can select which glob style to use (this to make it easier to compare the two side by side)

@kaos
Copy link
Member Author

kaos commented Nov 19, 2022

maybe we should just make those work?

Ah, maybe this is what you meant already with the above.. to roll our own glob function?

@kaos
Copy link
Member Author

kaos commented Nov 19, 2022

OK, sleeping on it a bit, I've arrived at this idea:

There are two distinct aspects of the rule pattern globs. Anchoring syntax and globbing syntax.

We may have 4 modes of anchoring a rule pattern glob.

  1. Anchored to project root.
  2. Anchored to declaration directory. (i.e. where it is defined)
  3. Anchored to invocation directory. (i.e. where it is applied, which may be != defined due to inheritance to sub directories)
  4. Floating, not anchored to anything.

For globbing syntax, we may start simple, to just support match path segment (*) and match all recursively (**) which I believe will be sufficient for the most common cases. Can easily be expanded upon in the future as the need arise.

  1. * equiv. to regexp: [^/]* (i.e. any path segment)
  2. ** equiv. to regexp; .* (i.e. recursive match all)
    a. I suggest that a recursive pattern when used as /** have the slash be optional, so that /foo/** also matches just /foo. equiv. regexp: /foo(/.*)?$

For the anchoring syntax, I have the following proposal, that a glob pattern is anchored based on the prefix:

  1. // - anchored to project root (same syntax as for the target address)
  2. / - anchored to the declaration directory.
  3. . - anchored to the invocation directory.
  4. `` - floating. (i.e. none of the above)

All globs have an implicit $ at the end of the pattern, so if anchored, must match the whole path when being applied.

Example globs:

# Allow the whole subtree under `src/python/project/`
//src/python/project/**

# Deny all directories two levels below the `3rdparty` tree. 
!//3rdparty/*/*

# Warn if there's a dependency on anything from a `docs/_priv` tree.
# e.g. warn on `src/project/a/docs/_priv/html/index.html` 
# regardless of where the rule was declared and where it was invoked from.
?docs/_priv/**

# Allow all sibling `pub/` files (i.e. when `pub/` is next to the targets BUILD file)
./pub/*

# Allow all files in the current subtree to depend on each other
/**

@benjyw
Copy link
Contributor

benjyw commented Nov 20, 2022

I agree on all counts, but what does a "floating" pattern mean? match anywhere in the path?

And re the glob syntax, I think that sounds exactly right, it's basically the same as gitignore syntax. And I agree that we should probably implement this ourselves (by converting to regex?) since the 3rdparty libs that re-implement gitignore seem janky af.

@kaos
Copy link
Member Author

kaos commented Nov 20, 2022

Yes, with a "floating" glob, I imagine it's not anchored to the beginning with ^ (i.e. re.search rather than re.match), and all globs have an implicit $ at the end to make sure they're "complete".

@kaos kaos changed the title [internal] Support : and :: in visibility rules. [internal] Support * and ** in visibility rules. Nov 20, 2022
@kaos kaos changed the title [internal] Support * and ** in visibility rules. Visibility rule globs * and ** adjustments Nov 20, 2022
@kaos kaos changed the title Visibility rule globs * and ** adjustments Visibility rule glob tweaks for * and ** Nov 20, 2022
@kaos kaos added category:bugfix Bug fixes for released features and removed category:internal CI, fixes for not-yet-released features, etc. labels Nov 20, 2022
@kaos kaos marked this pull request as ready for review November 20, 2022 03:27
@kaos kaos requested a review from benjyw November 21, 2022 14:53
Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proof is in the tests!

@kaos kaos merged commit 7ffd547 into pantsbuild:main Nov 21, 2022
@kaos kaos deleted the dep-rules-path-aliases branch November 21, 2022 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:bugfix Bug fixes for released features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants