Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-73435: Implement recursive wildcards in pathlib.PurePath.match() #101398

Merged
merged 40 commits into from
May 30, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
608e917
gh-73435: Implement recursive wildcards in pathlib.PurePath.match()
barneygale Jan 28, 2023
9a43c7f
Simplify code slightly
barneygale Jan 29, 2023
a846279
Fix support for newlines
barneygale Feb 15, 2023
bbd8cd6
Cache translation of individual components
barneygale Feb 15, 2023
b5c002e
Drop 'recursive' argument, make this the only behaviour.
barneygale Feb 15, 2023
0afcd54
Undo modifications to fnmatch.py
barneygale Feb 16, 2023
fe32717
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Feb 17, 2023
7b6f850
Fix Windows support
barneygale Feb 17, 2023
037488a
Tidy up code.
barneygale Feb 17, 2023
0741950
Add news blurb.
barneygale Feb 17, 2023
e1c9731
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Feb 20, 2023
db6f0ad
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Apr 3, 2023
8dff9e2
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Apr 9, 2023
314679f
Simplify patch; prepare for use in `glob()`
barneygale Apr 9, 2023
90eebcc
Make better use of path object caching.
barneygale Apr 9, 2023
4b5fffd
Add performance tip to docs
barneygale Apr 9, 2023
5e8bc28
Skip re-initialisation of PurePath patterns.
barneygale Apr 20, 2023
e81ab5a
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Apr 29, 2023
afb8047
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 2, 2023
722a1ab
Use `re.IGNORECASE` rather than `os.path.normcase()`
barneygale May 2, 2023
0ccf3df
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 6, 2023
ccea5e1
Add whats new entry
barneygale May 11, 2023
dd04294
Update Doc/whatsnew/3.12.rst
barneygale May 11, 2023
b258641
Apply suggestions from code review
barneygale May 14, 2023
ced8998
Explain _FNMATCH_SLICE
barneygale May 14, 2023
a33c7b6
Accidentally a word.
barneygale May 14, 2023
4b3bddb
Cache pattern compilation
barneygale May 14, 2023
6ad30dd
Remove unneeded `from None` suffix, whoops.
barneygale May 14, 2023
052890f
Tiny performance improvement: avoid accessing path.parts
barneygale May 14, 2023
d789b6d
Typo fix
barneygale May 14, 2023
4fe77c6
Avoid hashing path object when compiling pattern.
barneygale May 14, 2023
4770c13
More performance tweaks
barneygale May 14, 2023
559787d
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 18, 2023
9c09fc4
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 23, 2023
eb35dbc
Re-target to 3.13.
barneygale May 23, 2023
8959dfd
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 27, 2023
fec7702
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 29, 2023
89bc380
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 29, 2023
9211297
Add more comments!
barneygale May 30, 2023
73bb309
Update Lib/pathlib.py
barneygale May 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Doc/library/pathlib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -568,13 +568,24 @@ Pure paths provide the following methods and properties:
>>> PurePath('a/b.py').match('/*.py')
False

The *pattern* may be another path object; this speeds up matching the same
pattern against multiple files::

>>> pattern = PurePath('*.py')
>>> PurePath('a/b.py').match(pattern)
True

As with other methods, case-sensitivity follows platform defaults::

>>> PurePosixPath('b.py').match('*.PY')
False
>>> PureWindowsPath('b.py').match('*.PY')
True

.. versionchanged:: 3.12
Support for the recursive wildcard "``**``" was added. In previous
versions, it acted like the non-recursive wildcard "``*``".


.. method:: PurePath.relative_to(other, walk_up=False)

Expand Down
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.12.rst
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,9 @@ pathlib
* Add :meth:`pathlib.Path.is_junction` as a proxy to :func:`os.path.isjunction`.
(Contributed by Charles Machalow in :gh:`99547`.)

* Add support for recursive wildcards in :meth:`pathlib.PurePath.match`.
(Contributed by Barney Gale in :gh:`101398`.)
barneygale marked this conversation as resolved.
Show resolved Hide resolved


dis
---
Expand Down
70 changes: 56 additions & 14 deletions Lib/pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@ def _is_case_sensitive(flavour):
# Globbing helpers
#


_FNMATCH_PREFIX, _FNMATCH_SUFFIX = fnmatch.translate('_').split('_')
_FNMATCH_SLICE = slice(len(_FNMATCH_PREFIX), -len(_FNMATCH_SUFFIX))
barneygale marked this conversation as resolved.
Show resolved Hide resolved
_SWAP_SEP_AND_NEWLINE = {
'/': str.maketrans({'/': '\n', '\n': '/'}),
'\\': str.maketrans({'\\': '\n', '\n': '\\'}),
}


@functools.lru_cache()
def _make_selector(pattern_parts, flavour, case_sensitive):
pat = pattern_parts[0]
Expand Down Expand Up @@ -271,6 +280,13 @@ class PurePath(object):
# to implement comparison methods like `__lt__()`.
'_parts_normcase_cached',

# The `_lines_cached` and `_matcher_cached` slots store the
# string path with path separators and newlines swapped, and an
# `re.Pattern` object derived thereof. These are used to implement
# `match()`.
'_lines_cached',
'_matcher_cached',

# The `_hash` slot stores the hash of the case-normalized string
# path. It's set when `__hash__()` is called for the first time.
'_hash',
Expand Down Expand Up @@ -430,6 +446,41 @@ def _parts_normcase(self):
self._parts_normcase_cached = self._str_normcase.split(self._flavour.sep)
return self._parts_normcase_cached

@property
def _lines(self):
# Path with separators and newlines swapped, for pattern matching.
try:
return self._lines_cached
except AttributeError:
trans = _SWAP_SEP_AND_NEWLINE[self._flavour.sep]
self._lines_cached = str(self).translate(trans)
return self._lines_cached

@property
def _matcher(self):
try:
return self._matcher_cached
except AttributeError:
if not self.parts:
raise ValueError("empty pattern")
barneygale marked this conversation as resolved.
Show resolved Hide resolved
parts = [r'\A' if self.drive or self.root else '^']
for part in self._lines.splitlines(keepends=True):
if part == '**\n':
part = r'[\s\S]*^'
elif part == '**':
part = r'[\s\S]*'
elif '**' in part:
raise ValueError("Invalid pattern: '**' can only be an entire path component")
barneygale marked this conversation as resolved.
Show resolved Hide resolved
else:
part = fnmatch.translate(part)[_FNMATCH_SLICE]
parts.append(part)
parts.append(r'\Z')
flags = re.MULTILINE
if not _is_case_sensitive(self._flavour):
flags |= re.IGNORECASE
self._matcher_cached = re.compile(''.join(parts), flags=flags)
return self._matcher_cached

def __eq__(self, other):
if not isinstance(other, PurePath):
return NotImplemented
Expand Down Expand Up @@ -686,20 +737,11 @@ def match(self, path_pattern):
"""
Return True if this path matches the given pattern.
"""
pat = self.with_segments(path_pattern)
if not pat.parts:
raise ValueError("empty pattern")
pat_parts = pat._parts_normcase
parts = self._parts_normcase
if pat.drive or pat.root:
if len(pat_parts) != len(parts):
return False
elif len(pat_parts) > len(parts):
return False
for part, pat in zip(reversed(parts), reversed(pat_parts)):
if not fnmatch.fnmatchcase(part, pat):
return False
return True
if not isinstance(path_pattern, PurePath) or self._flavour is not path_pattern._flavour:
path_pattern = self.with_segments(path_pattern)
match = path_pattern._matcher.search(self._lines)
return match is not None


# Can't subclass os.PathLike from PurePath and keep the constructor
# optimizations in PurePath.__slots__.
Expand Down
24 changes: 23 additions & 1 deletion Lib/test/test_pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,8 +310,30 @@ def test_match_common(self):
self.assertFalse(P('/ab.py').match('/a/*.py'))
self.assertFalse(P('/a/b/c.py').match('/a/*.py'))
# Multi-part glob-style pattern.
self.assertFalse(P('/a/b/c.py').match('/**/*.py'))
self.assertTrue(P('a').match('**'))
self.assertTrue(P('c.py').match('**'))
self.assertTrue(P('a/b/c.py').match('**'))
self.assertTrue(P('/a/b/c.py').match('**'))
self.assertTrue(P('/a/b/c.py').match('/**'))
self.assertTrue(P('/a/b/c.py').match('**/'))
self.assertTrue(P('/a/b/c.py').match('/a/**'))
self.assertTrue(P('/a/b/c.py').match('**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/a/**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/a/b/**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/**/**/**/**/*.py'))
self.assertFalse(P('c.py').match('**/a.py'))
self.assertFalse(P('c.py').match('c/**'))
self.assertFalse(P('a/b/c.py').match('**/a'))
self.assertFalse(P('a/b/c.py').match('**/a/b'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c.'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c./**'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c./**'))
self.assertFalse(P('a/b/c.py').match('/a/b/c.py/**'))
self.assertFalse(P('a/b/c.py').match('/**/a/b/c.py'))
self.assertRaises(ValueError, P('a').match, '**a/b/c')
self.assertRaises(ValueError, P('a').match, 'a/b/c**')

def test_ordering_common(self):
# Ordering is tuple-alike.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add support for recursive wildcards in :meth:`pathlib.PurePath.match`.