Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-73435: Add pathlib.PurePath.full_match() #114350

Merged
merged 9 commits into from
Jan 26, 2024
Merged

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jan 20, 2024

In #101398 we added support for the recursive wildcard ** in pathlib.PurePath.match(). This should allow arbitrary prefix and suffix matching, like p.match('foo/**') or p.match('**/foo'), but there's a problem: for relative patterns only, match() implicitly inserts a ** token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: PurePath('foo/bar').match('bar/**') is true!

This PR reverts the changes to match(), and instead adds a new full_match() method that:

  • Allows empty patterns
  • Supports the recursive wildcard **
  • Matches the entire path when given a relative pattern


📚 Documentation preview 📚: https://cpython-previews--114350.org.readthedocs.build/en/114350/library/pathlib.html#pathlib.PurePath.globmatch

In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`globmatch()` method that:

- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern

As a result, `globmatch()`'s pattern language exactly matches that of
`glob()`.
Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Lib/pathlib/_abc.py Outdated Show resolved Hide resolved
@zooba
Copy link
Member

zooba commented Jan 22, 2024

Does the problem extend beyond "** is allowed to match nothing"? Or can we change the existing algorithm to require that a trailing ** has to match at least one segment?

Presumably .match("foo\\*") has to find something after foo\, so would be a better version of "foo\\**" anyway, but I can't imagine anyone being happy that "foo\\**" matches "foo\\" - you wouldn't specify the ** if you didn't want something to be found there.

@barneygale
Copy link
Contributor Author

** matching nothing is quite normal I think - it's what glob.glob(), pathlib.Path.glob(), zsh, and IIRC bash do. If you want at least one segment after foo/, you can spell that with foo/**/*.

@barneygale
Copy link
Contributor Author

e.g. zsh man page:

A pathname component of the form '(foo/)#' matches a path consisting of zero or more directories matching the pattern foo.

As a shorthand, '**/' is equivalent to '(*/)#'; note that this therefore matches files in the current directory as well as subdirectories. Thus:

ls (*/)#bar

or

ls **/bar

does a recursive directory search for files named 'bar' (potentially including the file 'bar' in the current directory).

@zooba
Copy link
Member

zooba commented Jan 22, 2024

Okay, I definitely like having full in the name somewhere then. It's still a match, and the parallel is fullmatch is way more helpful than to glob (which would just raise questions about "how does match do its matching then?")

@barneygale barneygale changed the title GH-73435: Add pathlib.PurePath.globmatch() GH-73435: Add pathlib.PurePath.full_match() Jan 22, 2024
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Lib/pathlib/_abc.py Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

Thanks for the reviews, both.

@barneygale barneygale enabled auto-merge (squash) January 26, 2024 00:51
@barneygale barneygale merged commit b69548a into python:main Jan 26, 2024
29 checks passed
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news topic-pathlib type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants