-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pruning dirnames in a topdown walk to improve speed on large filesystems #1240
Conversation
Sorry, this one seems to have passed me by. Please ping me again, and I'll look early next week. |
Here a little reminder as requested.
Le sam. 6 mai 2023 à 02:38, Martin Durant ***@***.***> a
écrit :
… Sorry, this one seems to have passed me by. Please ping me again, and I'll
look early next week.
—
Reply to this email directly, view it on GitHub
<#1240 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM5ZLHGVWPGWFWEBGJQXETXEWMOTANCNFSM6AAAAAAWVQDZBI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ianthomas23 , I wouldn't mind your eyes on this. I think it looks good, but I have not had time to think about any unintended side effects.
fsspec/spec.py
Outdated
When topdown is True, the caller can modify the dirs list in-place, | ||
and walk() will only recurse into the subdirectories whose names remain in dirs. | ||
Modifying dirs when topdown is False has no effect. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When topdown is True, the caller can modify the dirs list in-place, | |
and walk() will only recurse into the subdirectories whose names remain in dirs. | |
Modifying dirs when topdown is False has no effect. |
Text repeats below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be inclined to remove this text block as the block below is similar and contains more information.
(please fix the linting failure, for which you might choose to run locally:
) |
@@ -302,18 +302,30 @@ def test_find(self, scenario: ArchiveTestScenario): | |||
assert fs.find("deeply/") == fs.find("deeply") | |||
|
|||
@pytest.mark.parametrize("topdown", [True, False]) | |||
def test_walk(self, scenario: ArchiveTestScenario, topdown): | |||
@pytest.mark.parametrize("purne_nested", [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that purne
should be prune
?
fsspec/spec.py
Outdated
and walk() will only recurse into the subdirectories whose names remain in dirs. | ||
Modifying dirs when topdown is False has no effect. | ||
|
||
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI linting failures are all about these lines being too long. @epizut If you keep each line to 88 characters then the CI will pass.
fsspec/tests/test_api.py
Outdated
list(m.walk(dir0, maxdepth=0, topdown=False)) | ||
list(m.walk(dir1, maxdepth=0, topdown=False)) | ||
|
||
# purne dir111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purne
should be prune
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally this looks good to me, and I don't think there are any side-effects. Just a few nitpick comments to address from my point of view.
For the record: we were doing a similar thing in our localfs https://github.com/iterative/dvc-objects/blob/8dfef0b07f91c8c91dacbbfe5130d34fbaca68cb/src/dvc_objects/fs/local.py#L73 but didn't get to contributing it back yet. I'm really happy to see this PR not only tackle regular implementation but also async. Thank you @epizut 🙏 |
@epizut , I made fixes, and this closed your PR, not sure why. |
#1263 replaces this |
Solving #1223
Allowing modifying
dirnames
in order to improve browsing large filesystems:os.walk
Related issue: Add topdown kwarg to AbstractFileSystem.walk()