Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-73435: Implement recursive wildcards in pathlib.PurePath.match() #101398

Merged
merged 40 commits into from
May 30, 2023

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jan 28, 2023

PurePath.match() now handles the ** wildcard as in Path.glob(), i.e. it matches any number of path segments.

We now compile a re.Pattern object for the entire pattern. This is made more difficult by fnmatch not treating directory separators as special when evaluating wildcards (*, ?, etc), and so we arrange the path parts onto separate lines in a string, and ensure we don't set re.DOTALL.

This improves performance of match() around 2x-3x times for simple patterns, and more for complex patterns:

$ ./python -m timeit \
    -s 'from pathlib import PureWindowsPath as P; path = P("C:/foo/bar.py"); pattern = P("c:/*/*.py")' \
    'path.match(pattern)'
50000 loops, best of 5: 8.13 usec per loop   # before
1000000 loops, best of 5: 297 nsec per loop  # after

…ch()

Add a new *recursive* argument to `pathlib.PurePath.match()`, defaulting
to `False`. If set to true, `match()` handles the `**` wildcard as in
`Path.glob()`, i.e. it matches any number of path segments.

We now compile a `re.Pattern` object for the entire pattern. This is made
more difficult by `fnmatch` not treating directory separators as special
when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts
onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
@barneygale
Copy link
Contributor Author

barneygale commented Jan 28, 2023

Two big caveats:

@barneygale barneygale marked this pull request as ready for review February 17, 2023 19:24
@barneygale barneygale added the performance Performance or resource usage label Feb 17, 2023
@barneygale barneygale changed the title gh-73435: Implement recursive wildcards in pathlib.PurePath.match() GH-73435: Implement recursive wildcards in pathlib.PurePath.match() May 3, 2023
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but consider adding a couple of comments (as suggested) so that the next person who has to trace through this code is grateful rather than mad at you ;-)

Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Show resolved Hide resolved
@zooba
Copy link
Member

zooba commented May 30, 2023

Perfect! Ship it

Lib/pathlib.py Outdated Show resolved Hide resolved
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@barneygale barneygale enabled auto-merge (squash) May 30, 2023 19:50
@barneygale
Copy link
Contributor Author

Thank you for your help Alex, Hugo and Steve!

@barneygale
Copy link
Contributor Author

Hey, if it interests anyone, I have a follow-up PR that simplifies a bunch of the code added in this PR. It does this by adding a new seps parameter to fnmatch.translate(), and varying the generated regular expresison when its supplied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants