-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr encoding attributes persist after slicing data, raising error on to_zarr
#5219
Comments
Thanks for the clear error report. On master you should be able to do |
Thanks for the pointer @mathause that is super helpful. And thanks for #5065 @rabernat. If I'm understanding the PR correctly (looks like it evolved a lot!) in most cases matching the example above, we probably would NOT want to use Does that sound right? I feel like if I'm reading through the PR comments correctly, this was one of the controversial parts that didnt' end up in the merged PR. |
correct The problem in this issue is that the dataset is carrying around its original chunks in for var in ds:
del ds[var].encoding['chunks'] Originally part of #5056 was a change that would have xarray automatically do this deletion after some operations (such as calling |
Yup this all makes sense thanks for the explanation @rabernat . It does seem like it would be good to drop Anyways, we'll continue with the manual deletion for now but I'm inclined to keep this issue open as I do think it would be helpful to eventually figure out how to automatically do this. |
Somewhat inevitably, I finally fit this issue this week, too :) I am increasingly coming to the conclusion that there are no "safe" manipulations in xarray that should preserve |
It occurs to me that another possible fix would be to ignore Probably better to remove |
This could also break existing workflows though. For example, pangeo-forge is using the encoding.chunks attribute to specify target dataset chunks. |
Is this still an issue after merging PR ,#5065? |
@vedal Hi, I'm still facing this issue using xarray 2022.11.0 and python 3.10. Is this PR included in the latest xarray version ? |
What happened:
Opened a dataset using
open_zarr
, sliced the dataset, and then tried to resave to a zarr store usingto_zarr
.What you expected to happen:
The file would save without needing to explicitly modify any
encoding
dictionary valuesMinimal Complete Verifiable Example:
This raises:
Anything else we need to know?:
Not sure if there is a good way around this (or perhaps this is even desired behavior?), but figured I would flag it as it seemed unexpected and took us a second to diagnose. Once you've loaded the data from a zarr store, I feel like the default behavior should probably be to forget the encodings used to save that zarr, treating the in-memory dataset object just like any other in-memory dataset object that could have been loaded from any source. But maybe I'm in the minority or missing some nuance about why you'd want the encoding to hang around.
Environment:
The text was updated successfully, but these errors were encountered: