-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
glob.glob('**/**', recursive=True)
yields duplicate results
#104269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Reasonably minor bug I think as these patterns are pretty rare. A note in the docs might be the best solution. |
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
Mar 4, 2024
The present implementation of `pathlib.Path.glob()` creates a series of 'selectors' that each handle a part of the pattern. The selectors are connected together in `glob()`, without the use of recursion. One very subtle property of this scheme is that each selector is exhaused *before* its successor selector - for example when globbing `*/*.py`, the selector for `*` is exhausted prior to the selector for `*.py`. This doesn't make any difference when globbing strings, but it does prevent us from adding `dir_fd` support, because there's no good moment to call `os.close(fd)` after opening a directory for scanning. This patch refactors globbing to work much as it did in 3.12, where each selector is responsible for creating and feeding its own successor. This inverts the order of selector exhaustion, and so will make it much easier to add `dir_fd` support. There's one behaviour change here: I've removes deduplication of results, and so in some very specific circumstances (multiple non-consecutive `**` segments in pattern, and either `follow_symlinks=None` or `..` segments separating them), `glob()` can yield the same path more than once. Note that `glob.glob()` can also yield duplicate results - see pythonGH-104269.
This was referenced Mar 4, 2024
barneygale
pushed a commit
that referenced
this issue
Apr 11, 2024
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Apr 11, 2024
…`**` patterns (pythonGH-105406) (cherry picked from commit c06be6b) Co-authored-by: Tomas R <tomas.roun8@gmail.com>
barneygale
pushed a commit
that referenced
this issue
Apr 11, 2024
Thanks for sorting this, @tomasr8! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Calling
glob.glob(pattern, recursive=True)
, where pattern contains two or more**/
segments, can yield the same paths multiple times:Linked PRs
glob.glob
duplicates when using multiple**
patterns #105406glob.glob
duplicates when using multiple**
patterns (GH-105406) #117757The text was updated successfully, but these errors were encountered: