-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing to zarr fails with message "specified zarr chunks would overlap multiple dask chunks" #347
Comments
Hi Tonio, I am facing a similar issue with the same error message. When executing the example jupyter NB
xarray does not adjust the encoding of varialbes when rechunking a dataset.
This problem is already known: Maybe we could implement in xcube, that if a dataset is rechunked, then the encoding is adjusted as well? |
We already did: https://github.com/dcs4cop/xcube/blob/bc4cdb4aa5e88557d920d71b7dff4100015c3512/xcube/core/chunk.py#L10 (Make sure you set the format when you use this, otherwise the encodings won't be updated). However, I still couldn't run your notebook cell successfully, but that seems to be due to another problem. |
Thanks for your suggestion, I changed the notebook to use xcube chunk_dataset instead of xarrays way to chunk a dataset. It now works like a dream :) |
This can and should be fixed in xarray. But it's also very easy to just delete the encoding. del new_dataset.c2rcc_flags.encoding['chunks'] as a simple workaround. |
@rabernat thanks! The xcube function
We should make sure that any xr.Dataset returned from higher-level xcube functions should always have a chunking compatible with Zarr, as this is our standard I/O format. Users should not be forced to rechunk just for the purpose of writing to Zarr. This is counter-intuitive. I suggest we provide a utiliy function that ensures "valid" chunking (including possible deletion of the "chunks" encoding property, as suggested by @rabernat), I guess there are good reasons why encoding is not adjusted in |
🙌 Could I convince you to submit this as a PR to xarray itself? 😁 |
I'm convinced. Hope to find a little time next week. Maybe there is a related issue already? |
pydata/xarray#2300 is the main one. |
Workaround for some cases that end up in #347
FYI I have started a PR to fix this upstream in Xarray. Your review there would be helpful. |
Sure @rabernat. Thanks! |
For some data it is necessary to rechunk it in order to deal with it efficiently. This rechunking process might result in that the final chunk of a dimension has a smaller size than the previous ones. E.g., if we have a dimension of 500, originally split into ten chunks of size 50, we can rechunk it to sizes of 200, 200, and 100. This is perfectly valid and supported by xarray.
However, when the affected dimension/variable is latitude and the latitude is ascending, the current implementation of the normalization in xcube will revert the order of chunks and cause that the smaller chunk will be at the start - which is not supported by xarray and might result in an error as the one in the description, for example when writing to zarr.
For the time being, I will work around this issue by ensuring that all chunks are of the same size after rechunking, which will make this issue a bit harder to reproduce. I open this issue to document that the current solution of dealing with ascending latitudes is not optimal. The best solution will probably be to get rid of the part where latitudes are reverted during the normalization and support ascending latitudes in the data.
See also #251 and #327.
The text was updated successfully, but these errors were encountered: