Skip to content

Commit b69548a

Browse files
authored
GH-73435: Add pathlib.PurePath.full_match() (#114350)
In 49f90ba we added support for the recursive wildcard `**` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `**` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/**')` is true! This commit reverts the changes to `match()`, and instead adds a new `full_match()` method that: - Allows empty patterns - Supports the recursive wildcard `**` - Matches the *entire* path when given a relative pattern
1 parent 841eacd commit b69548a

File tree

6 files changed

+155
-72
lines changed

6 files changed

+155
-72
lines changed

Doc/library/glob.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -147,8 +147,9 @@ The :mod:`glob` module defines the following functions:
147147

148148
.. seealso::
149149

150-
:meth:`pathlib.PurePath.match` and :meth:`pathlib.Path.glob` methods,
151-
which call this function to implement pattern matching and globbing.
150+
:meth:`pathlib.PurePath.full_match` and :meth:`pathlib.Path.glob`
151+
methods, which call this function to implement pattern matching and
152+
globbing.
152153

153154
.. versionadded:: 3.13
154155

Doc/library/pathlib.rst

+30-30
Original file line numberDiff line numberDiff line change
@@ -559,55 +559,55 @@ Pure paths provide the following methods and properties:
559559
PureWindowsPath('c:/Program Files')
560560

561561

562-
.. method:: PurePath.match(pattern, *, case_sensitive=None)
562+
.. method:: PurePath.full_match(pattern, *, case_sensitive=None)
563563

564564
Match this path against the provided glob-style pattern. Return ``True``
565-
if matching is successful, ``False`` otherwise.
566-
567-
If *pattern* is relative, the path can be either relative or absolute,
568-
and matching is done from the right::
565+
if matching is successful, ``False`` otherwise. For example::
569566

570-
>>> PurePath('a/b.py').match('*.py')
571-
True
572-
>>> PurePath('/a/b/c.py').match('b/*.py')
567+
>>> PurePath('a/b.py').full_match('a/*.py')
573568
True
574-
>>> PurePath('/a/b/c.py').match('a/*.py')
569+
>>> PurePath('a/b.py').full_match('*.py')
575570
False
571+
>>> PurePath('/a/b/c.py').full_match('/a/**')
572+
True
573+
>>> PurePath('/a/b/c.py').full_match('**/*.py')
574+
True
576575

577-
If *pattern* is absolute, the path must be absolute, and the whole path
578-
must match::
576+
As with other methods, case-sensitivity follows platform defaults::
579577

580-
>>> PurePath('/a.py').match('/*.py')
581-
True
582-
>>> PurePath('a/b.py').match('/*.py')
578+
>>> PurePosixPath('b.py').full_match('*.PY')
583579
False
580+
>>> PureWindowsPath('b.py').full_match('*.PY')
581+
True
584582

585-
The *pattern* may be another path object; this speeds up matching the same
586-
pattern against multiple files::
583+
Set *case_sensitive* to ``True`` or ``False`` to override this behaviour.
587584

588-
>>> pattern = PurePath('*.py')
589-
>>> PurePath('a/b.py').match(pattern)
590-
True
585+
.. versionadded:: 3.13
591586

592-
.. versionchanged:: 3.12
593-
Accepts an object implementing the :class:`os.PathLike` interface.
594587

595-
As with other methods, case-sensitivity follows platform defaults::
588+
.. method:: PurePath.match(pattern, *, case_sensitive=None)
596589

597-
>>> PurePosixPath('b.py').match('*.PY')
598-
False
599-
>>> PureWindowsPath('b.py').match('*.PY')
590+
Match this path against the provided non-recursive glob-style pattern.
591+
Return ``True`` if matching is successful, ``False`` otherwise.
592+
593+
This method is similar to :meth:`~PurePath.full_match`, but empty patterns
594+
aren't allowed (:exc:`ValueError` is raised), the recursive wildcard
595+
"``**``" isn't supported (it acts like non-recursive "``*``"), and if a
596+
relative pattern is provided, then matching is done from the right::
597+
598+
>>> PurePath('a/b.py').match('*.py')
599+
True
600+
>>> PurePath('/a/b/c.py').match('b/*.py')
600601
True
602+
>>> PurePath('/a/b/c.py').match('a/*.py')
603+
False
601604

602-
Set *case_sensitive* to ``True`` or ``False`` to override this behaviour.
605+
.. versionchanged:: 3.12
606+
The *pattern* parameter accepts a :term:`path-like object`.
603607

604608
.. versionchanged:: 3.12
605609
The *case_sensitive* parameter was added.
606610

607-
.. versionchanged:: 3.13
608-
Support for the recursive wildcard "``**``" was added. In previous
609-
versions, it acted like the non-recursive wildcard "``*``".
610-
611611

612612
.. method:: PurePath.relative_to(other, walk_up=False)
613613

Doc/whatsnew/3.13.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,8 @@ pathlib
336336
object from a 'file' URI (``file:/``).
337337
(Contributed by Barney Gale in :gh:`107465`.)
338338

339-
* Add support for recursive wildcards in :meth:`pathlib.PurePath.match`.
339+
* Add :meth:`pathlib.PurePath.full_match` for matching paths with
340+
shell-style wildcards, including the recursive wildcard "``**``".
340341
(Contributed by Barney Gale in :gh:`73435`.)
341342

342343
* Add *follow_symlinks* keyword-only argument to :meth:`pathlib.Path.glob`,

Lib/pathlib/__init__.py

+7
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,13 @@ def _pattern_stack(self):
490490
parts.reverse()
491491
return parts
492492

493+
@property
494+
def _pattern_str(self):
495+
"""The path expressed as a string, for use in pattern-matching."""
496+
# The string representation of an empty path is a single dot ('.'). Empty
497+
# paths shouldn't match wildcards, so we change it to the empty string.
498+
path_str = str(self)
499+
return '' if path_str == '.' else path_str
493500

494501
# Subclassing os.PathLike makes isinstance() checks slower,
495502
# which in turn makes Path construction slower. Register instead!

Lib/pathlib/_abc.py

+38-16
Original file line numberDiff line numberDiff line change
@@ -47,19 +47,16 @@ def _is_case_sensitive(pathmod):
4747
re = glob = None
4848

4949

50-
@functools.lru_cache(maxsize=256)
51-
def _compile_pattern(pat, sep, case_sensitive):
50+
@functools.lru_cache(maxsize=512)
51+
def _compile_pattern(pat, sep, case_sensitive, recursive=True):
5252
"""Compile given glob pattern to a re.Pattern object (observing case
5353
sensitivity)."""
5454
global re, glob
5555
if re is None:
5656
import re, glob
5757

5858
flags = re.NOFLAG if case_sensitive else re.IGNORECASE
59-
regex = glob.translate(pat, recursive=True, include_hidden=True, seps=sep)
60-
# The string representation of an empty path is a single dot ('.'). Empty
61-
# paths shouldn't match wildcards, so we consume it with an atomic group.
62-
regex = r'(\.\Z)?+' + regex
59+
regex = glob.translate(pat, recursive=recursive, include_hidden=True, seps=sep)
6360
return re.compile(regex, flags=flags).match
6461

6562

@@ -441,23 +438,48 @@ def _pattern_stack(self):
441438
raise NotImplementedError("Non-relative patterns are unsupported")
442439
return parts
443440

441+
@property
442+
def _pattern_str(self):
443+
"""The path expressed as a string, for use in pattern-matching."""
444+
return str(self)
445+
444446
def match(self, path_pattern, *, case_sensitive=None):
445447
"""
446-
Return True if this path matches the given pattern.
448+
Return True if this path matches the given pattern. If the pattern is
449+
relative, matching is done from the right; otherwise, the entire path
450+
is matched. The recursive wildcard '**' is *not* supported by this
451+
method.
447452
"""
448453
if not isinstance(path_pattern, PurePathBase):
449454
path_pattern = self.with_segments(path_pattern)
450455
if case_sensitive is None:
451456
case_sensitive = _is_case_sensitive(self.pathmod)
452457
sep = path_pattern.pathmod.sep
453-
if path_pattern.anchor:
454-
pattern_str = str(path_pattern)
455-
elif path_pattern.parts:
456-
pattern_str = str('**' / path_pattern)
457-
else:
458+
path_parts = self.parts[::-1]
459+
pattern_parts = path_pattern.parts[::-1]
460+
if not pattern_parts:
458461
raise ValueError("empty pattern")
459-
match = _compile_pattern(pattern_str, sep, case_sensitive)
460-
return match(str(self)) is not None
462+
if len(path_parts) < len(pattern_parts):
463+
return False
464+
if len(path_parts) > len(pattern_parts) and path_pattern.anchor:
465+
return False
466+
for path_part, pattern_part in zip(path_parts, pattern_parts):
467+
match = _compile_pattern(pattern_part, sep, case_sensitive, recursive=False)
468+
if match(path_part) is None:
469+
return False
470+
return True
471+
472+
def full_match(self, pattern, *, case_sensitive=None):
473+
"""
474+
Return True if this path matches the given glob-style pattern. The
475+
pattern is matched against the entire path.
476+
"""
477+
if not isinstance(pattern, PurePathBase):
478+
pattern = self.with_segments(pattern)
479+
if case_sensitive is None:
480+
case_sensitive = _is_case_sensitive(self.pathmod)
481+
match = _compile_pattern(pattern._pattern_str, pattern.pathmod.sep, case_sensitive)
482+
return match(self._pattern_str) is not None
461483

462484

463485

@@ -781,8 +803,8 @@ def glob(self, pattern, *, case_sensitive=None, follow_symlinks=None):
781803
if filter_paths:
782804
# Filter out paths that don't match pattern.
783805
prefix_len = len(str(self._make_child_relpath('_'))) - 1
784-
match = _compile_pattern(str(pattern), sep, case_sensitive)
785-
paths = (path for path in paths if match(str(path), prefix_len))
806+
match = _compile_pattern(pattern._pattern_str, sep, case_sensitive)
807+
paths = (path for path in paths if match(path._pattern_str, prefix_len))
786808
return paths
787809

788810
def rglob(self, pattern, *, case_sensitive=None, follow_symlinks=None):

Lib/test/test_pathlib/test_pathlib_abc.py

+75-23
Original file line numberDiff line numberDiff line change
@@ -249,39 +249,91 @@ def test_match_common(self):
249249
self.assertFalse(P('/ab.py').match('/a/*.py'))
250250
self.assertFalse(P('/a/b/c.py').match('/a/*.py'))
251251
# Multi-part glob-style pattern.
252-
self.assertTrue(P('a').match('**'))
253-
self.assertTrue(P('c.py').match('**'))
254-
self.assertTrue(P('a/b/c.py').match('**'))
255-
self.assertTrue(P('/a/b/c.py').match('**'))
256-
self.assertTrue(P('/a/b/c.py').match('/**'))
257-
self.assertTrue(P('/a/b/c.py').match('/a/**'))
258-
self.assertTrue(P('/a/b/c.py').match('**/*.py'))
259-
self.assertTrue(P('/a/b/c.py').match('/**/*.py'))
252+
self.assertFalse(P('/a/b/c.py').match('/**/*.py'))
260253
self.assertTrue(P('/a/b/c.py').match('/a/**/*.py'))
261-
self.assertTrue(P('/a/b/c.py').match('/a/b/**/*.py'))
262-
self.assertTrue(P('/a/b/c.py').match('/**/**/**/**/*.py'))
263-
self.assertFalse(P('c.py').match('**/a.py'))
264-
self.assertFalse(P('c.py').match('c/**'))
265-
self.assertFalse(P('a/b/c.py').match('**/a'))
266-
self.assertFalse(P('a/b/c.py').match('**/a/b'))
267-
self.assertFalse(P('a/b/c.py').match('**/a/b/c'))
268-
self.assertFalse(P('a/b/c.py').match('**/a/b/c.'))
269-
self.assertFalse(P('a/b/c.py').match('**/a/b/c./**'))
270-
self.assertFalse(P('a/b/c.py').match('**/a/b/c./**'))
271-
self.assertFalse(P('a/b/c.py').match('/a/b/c.py/**'))
272-
self.assertFalse(P('a/b/c.py').match('/**/a/b/c.py'))
273-
self.assertRaises(ValueError, P('a').match, '**a/b/c')
274-
self.assertRaises(ValueError, P('a').match, 'a/b/c**')
275254
# Case-sensitive flag
276255
self.assertFalse(P('A.py').match('a.PY', case_sensitive=True))
277256
self.assertTrue(P('A.py').match('a.PY', case_sensitive=False))
278257
self.assertFalse(P('c:/a/B.Py').match('C:/A/*.pY', case_sensitive=True))
279258
self.assertTrue(P('/a/b/c.py').match('/A/*/*.Py', case_sensitive=False))
280259
# Matching against empty path
281260
self.assertFalse(P('').match('*'))
282-
self.assertTrue(P('').match('**'))
261+
self.assertFalse(P('').match('**'))
283262
self.assertFalse(P('').match('**/*'))
284263

264+
def test_full_match_common(self):
265+
P = self.cls
266+
# Simple relative pattern.
267+
self.assertTrue(P('b.py').full_match('b.py'))
268+
self.assertFalse(P('a/b.py').full_match('b.py'))
269+
self.assertFalse(P('/a/b.py').full_match('b.py'))
270+
self.assertFalse(P('a.py').full_match('b.py'))
271+
self.assertFalse(P('b/py').full_match('b.py'))
272+
self.assertFalse(P('/a.py').full_match('b.py'))
273+
self.assertFalse(P('b.py/c').full_match('b.py'))
274+
# Wildcard relative pattern.
275+
self.assertTrue(P('b.py').full_match('*.py'))
276+
self.assertFalse(P('a/b.py').full_match('*.py'))
277+
self.assertFalse(P('/a/b.py').full_match('*.py'))
278+
self.assertFalse(P('b.pyc').full_match('*.py'))
279+
self.assertFalse(P('b./py').full_match('*.py'))
280+
self.assertFalse(P('b.py/c').full_match('*.py'))
281+
# Multi-part relative pattern.
282+
self.assertTrue(P('ab/c.py').full_match('a*/*.py'))
283+
self.assertFalse(P('/d/ab/c.py').full_match('a*/*.py'))
284+
self.assertFalse(P('a.py').full_match('a*/*.py'))
285+
self.assertFalse(P('/dab/c.py').full_match('a*/*.py'))
286+
self.assertFalse(P('ab/c.py/d').full_match('a*/*.py'))
287+
# Absolute pattern.
288+
self.assertTrue(P('/b.py').full_match('/*.py'))
289+
self.assertFalse(P('b.py').full_match('/*.py'))
290+
self.assertFalse(P('a/b.py').full_match('/*.py'))
291+
self.assertFalse(P('/a/b.py').full_match('/*.py'))
292+
# Multi-part absolute pattern.
293+
self.assertTrue(P('/a/b.py').full_match('/a/*.py'))
294+
self.assertFalse(P('/ab.py').full_match('/a/*.py'))
295+
self.assertFalse(P('/a/b/c.py').full_match('/a/*.py'))
296+
# Multi-part glob-style pattern.
297+
self.assertTrue(P('a').full_match('**'))
298+
self.assertTrue(P('c.py').full_match('**'))
299+
self.assertTrue(P('a/b/c.py').full_match('**'))
300+
self.assertTrue(P('/a/b/c.py').full_match('**'))
301+
self.assertTrue(P('/a/b/c.py').full_match('/**'))
302+
self.assertTrue(P('/a/b/c.py').full_match('/a/**'))
303+
self.assertTrue(P('/a/b/c.py').full_match('**/*.py'))
304+
self.assertTrue(P('/a/b/c.py').full_match('/**/*.py'))
305+
self.assertTrue(P('/a/b/c.py').full_match('/a/**/*.py'))
306+
self.assertTrue(P('/a/b/c.py').full_match('/a/b/**/*.py'))
307+
self.assertTrue(P('/a/b/c.py').full_match('/**/**/**/**/*.py'))
308+
self.assertFalse(P('c.py').full_match('**/a.py'))
309+
self.assertFalse(P('c.py').full_match('c/**'))
310+
self.assertFalse(P('a/b/c.py').full_match('**/a'))
311+
self.assertFalse(P('a/b/c.py').full_match('**/a/b'))
312+
self.assertFalse(P('a/b/c.py').full_match('**/a/b/c'))
313+
self.assertFalse(P('a/b/c.py').full_match('**/a/b/c.'))
314+
self.assertFalse(P('a/b/c.py').full_match('**/a/b/c./**'))
315+
self.assertFalse(P('a/b/c.py').full_match('**/a/b/c./**'))
316+
self.assertFalse(P('a/b/c.py').full_match('/a/b/c.py/**'))
317+
self.assertFalse(P('a/b/c.py').full_match('/**/a/b/c.py'))
318+
self.assertRaises(ValueError, P('a').full_match, '**a/b/c')
319+
self.assertRaises(ValueError, P('a').full_match, 'a/b/c**')
320+
# Case-sensitive flag
321+
self.assertFalse(P('A.py').full_match('a.PY', case_sensitive=True))
322+
self.assertTrue(P('A.py').full_match('a.PY', case_sensitive=False))
323+
self.assertFalse(P('c:/a/B.Py').full_match('C:/A/*.pY', case_sensitive=True))
324+
self.assertTrue(P('/a/b/c.py').full_match('/A/*/*.Py', case_sensitive=False))
325+
# Matching against empty path
326+
self.assertFalse(P('').full_match('*'))
327+
self.assertTrue(P('').full_match('**'))
328+
self.assertFalse(P('').full_match('**/*'))
329+
# Matching with empty pattern
330+
self.assertTrue(P('').full_match(''))
331+
self.assertTrue(P('.').full_match('.'))
332+
self.assertFalse(P('/').full_match(''))
333+
self.assertFalse(P('/').full_match('.'))
334+
self.assertFalse(P('foo').full_match(''))
335+
self.assertFalse(P('foo').full_match('.'))
336+
285337
def test_parts_common(self):
286338
# `parts` returns a tuple.
287339
sep = self.sep

0 commit comments

Comments
 (0)