Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace glob with rglob #283 #284

Merged
merged 2 commits into from
Jul 11, 2023
Merged

replace glob with rglob #283 #284

merged 2 commits into from
Jul 11, 2023

Conversation

Keunmo
Copy link
Contributor

@Keunmo Keunmo commented Jul 5, 2023

PR for fix #283.

Replace Path(some_path).glob('**/*') with Path(some_path).rglob('*/*').

Act same as original code, and it can also find files from symlink.

@sarlinpe
Copy link
Member

sarlinpe commented Jul 7, 2023

Thanks for sending this PR. The doc mentions

This is like calling Path.glob() with “**/” added in front of the given relative pattern

So shouldn't the update be

- paths += list(Path(root).glob('**/'+g))
+ paths += list(Path(root).rglob(g))

instead of

- paths += list(Path(root).glob('**/'+g))
+ paths += list(Path(root).rglob('*/'+g))

?

@Keunmo
Copy link
Contributor Author

Keunmo commented Jul 8, 2023

Thanks for reviewing @skydes !

I've tested with several cases, and I think current update is more suitable for handling various scenarios involving symlinks.

  • Case 1: When dataset path is a symlink, and mapping path and query path are regular directories.
  • Case 2: When dataset path is a regular directory, and mapping path and query path are symlinks.
  • Case 3: When dataset path, mapping path, and query path are all symlinks.

In these cases, paths += list(Path(root).rglob('*/'+g)) worked well for all three cases, but paths += list(Path(root).rglob(g)) only worked for case 1 and failed for cases 2 and 3. I think it's because when rglob traversal, if a child directory is a symlink, it does not explore within that symlink directory.

So, I believe current approach is more appropriate.

@sarlinpe
Copy link
Member

sarlinpe commented Jul 8, 2023

Path(root).rglob('*/'+g) only handles symlinks at the top level, unlike glob.glob which handles arbitrary levels. This is exactly what the cpython issue says. Let's instead use glob, as:

paths += glob.glob((Path(root) / '**' / g).as_posix()), recursive=True)

Can you please confirm that it works and make the changes? Thanks!

@Keunmo
Copy link
Contributor Author

Keunmo commented Jul 9, 2023

You're right, glob.glob looks better. I changed it to glob, but it returns list with str, not pathlib.PosixPath. So I convert it to PosixPath like this:

paths += list(map(Path, glob.glob((Path(root) / '**' / g).as_posix(), recursive=True)))

And it works all good. Thanks!

@sarlinpe sarlinpe merged commit 92b1de7 into cvg:master Jul 11, 2023
@Keunmo Keunmo deleted the fix_pathlib_glob branch August 13, 2023 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pathlib.Path.glob does not follow symlinks
2 participants