-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileNotFoundError when using zarr with dask/s3fs version >= 0.5.0 #649
Comments
Hi @AliceBalfanz, this sounds like it might be related to an issue I was seeing as well: fsspec/filesystem_spec#342 |
Hi @joshmoore, fsspec/filesystem_spec#342 is closed, but I cannot see how its resolution fixes the actual issue described here. Currently I'm not sure if the root cause originates from Zarr or whether we should implement or use another S3-capable We really need "missing chunks" on S3 as we have to deal with large sparse data cubes comprising a large number of NaN-chunks. In fact we use a tool xcube prune to erase NaN chunks from datasets as this drastically reduces the number of files (and uploads to S3). |
Hi @forman and @AliceBalfanz. Sorry for the friction you have experienced here! The transition to the fsspec-based storage backends brings some major benefits (e.g. async, see #536 (comment)), but clearly there are also some transitional challenges to overcome. I agree that we must be able to preserve the earlier behavior of correctly filling empty chunks, rather than raising an error. I urge you to not give up on s3fs and instead hold out for a bug fix. We appreciate your support and patience. I'm wondering if @martindurant can weigh in on this. |
It seems that this "indirect route" of the old usage may have a hole. The simplified version appears to work
Obviously old and new ought to work. I'll look into it. |
Could you clarify / define the following terms so that we are better able to participate in the discussion
Thanks! |
I mean that in my version of the call to zarr, I'm passing a URL, so this gets routed to the new FSStore, rather than a bare S3Map. There is an optional argument to getitems on what to do with errored/missing keys, and from zarr's point of view, the argument should be "omit", as FSStore does, but the default is "raise". Note that there is a further problem, in that something is turning the paths lower-case, but I think this must be in zarr. Bear with me. |
Thanks Martin for your fast reply! |
So, apparently lower-case paths are the canonical norm in zarr - this is probably documented somewhere. To open without consolidated, you need to do
I need to fix the fact that this doesn't work with |
Can we make |
I would need to investigate where this comes from - it appears to be default True elsewhere. |
@joshmoore , @rabernat, @martindurant thanks so much for your immediate responses, looks like this will be resolved soon. Don't hesitate to tell us how we can best support you! |
FYI: pydata/xarray#5028 is (likely) related issue due to |
When using a zarr store from an s3 bucket with not storing physically chunks which are in uninitialized state, as described in the specification of zarr (https://zarr.readthedocs.io/en/stable/spec/v2.html#chunks), a
FileNotFoundError
occurs. This is new sinces3fs version >= 0.5.0
.Minimal, reproducible code sample, a copy-pastable example if possible
This results in
FileNotFoundError: The specified key does not exist.
Version and installation information
Please provide the following:
zarr.__version__
2.5.0s3fs.__version__
0.5.1fsspec.__version__
0.8.0Working as expected with the following versions:
zarr.__version__
2.5.0s3fs.__version__
0.4.0.fsspec.__version__
0.6.2The text was updated successfully, but these errors were encountered: