-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3FileSystem.find appears not to respect max_depth #378
Comments
Sorry about this, a case of choosing one optimisation over another. #360 changed We could resurrect the old code, and use it for the case that depth is small. What do you think? |
I haven't followed the history so it's difficult for me to comment on the optimisation of find (and I don't understand the linked PR; from its diff it looks like the method didn't exist at all before?), but the current situation in which there exists a |
Oops, I just realised that the |
I can confirm that in v0.5.0 the MCVE (adding import s3fs
fs=s3fs.S3FileSystem(anon=True)
d = "noaa-goes16/ABI-L1b-RadF/2020/045/"
print(len(fs.ls(d)))
print(len(fs.find(d, maxdepth=1, withdirs=True)))
print(len(fs.glob(d + "*"))) With 0.5.0:
With 0.5.1:
i.e. it's not just trading one optimisation against another, I think we can call this broken. |
I would appreciate if you would for this into a (failing) test function with expectations in a PR, so that a fix is easier to make. |
Note that with current master, the last result is also 24, so something got fixed along the line. |
Added a test to verify that the find method respects the maxdepth parameter. With ``maxdepth=1``, the results of ``find`` should be the same as those ``ls``, without returning subdirectories. See also issue 378.
See #379 for a unit test exposing the problem. I need to focus on something else right now. |
Test for #378 to verify find respects maxdepth.
What happened:
When I call
find(d, maxdepth=1)
on a directory with 24 subdirectories, it returns not only the 24 subdirectories but also all the files contained in all those directories. That means, the output is:What you expected to happen:
Due to the
maxdepth=1
parameter, I expected it would only return the 24 subdirectories. That means, I expect the output to be:Minimal Complete Verifiable Example:
Anything else we need to know?:
This makes globbing in higher level directories so slow as to be unusable. For example, a glob of
"noaa-goes16/ABI-L1b-RadF/*"
will call.find("noaa-goes16/ABI-L1b-RadF", maxdepth=1)
which (ignoring the maxdepth parameter) will list the entire contents of the subdirectory, millions of files, and will take a very long time to complete.Environment:
pip install
from git masterThe text was updated successfully, but these errors were encountered: