-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible backends - Harmonise zarr chunking with other backends chunking #4496
Comments
That's not completely true. With no dask installed |
With regards to |
I think we can keep talking here about xarray chunking interface. It seems that the interface for chunking is a tricky problem in xarray. There are involved different interfaces already implemented:
They are similar, but there are some inconsistencies. dask
The allowed values in the dictionary are: xarray: xarray: xarray: xr.open_dataset(engine="zarr")
Points to be discussed:
@shoyer, @alexamici, @jhamman, @dcherian, @weiji14 suggestions are welcome |
Just a general comment on the For those who are confused, this is the current state of
Sample code to test (run in jupyter notebook to see the dask chunk visual): import xarray as xr
import fsspec
# Opening NetCDF
dataset: xr.Dataset = xr.open_dataset(
"http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/HRRR/CONUS_2p5km/Best", chunks={}
)
dataset.Temperature_height_above_ground.data
# Opening Zarr
zstore = fsspec.get_mapper(
url="gs://cmip6/CMIP/NCAR/CESM2/historical/r9i1p1f1/Amon/tas/gn/"
)
dataset: xr.Dataset = xr.open_dataset(
filename_or_obj=zstore,
engine="zarr",
chunks={},
backend_kwargs=dict(consolidated=True),
)
dataset.tas.data |
@weiji14 Thank you very much for your feedback. I think we should align also Maybe we should evaluate for the future to integrate/use dask function |
Hi. I'm trying to find an issue that is closest to the problem that I have, and this seems to be the best one, and most related. Say, I have a zarr dataset with multiple variables Note that Let me know if you would prefer me to open a completely new issue for this. |
@ravwojdyla I think that currently there is no way to do this. But it would be nice to have an interface that allows defining different chunks for each variable. |
* modify get_chunks to align zarr chunking as described in issue #4496 * fix: maintain old open_zarr chunking interface * add and fix tests * black * bugfix * add few documentation on open_dataset chunking * in test: re-add xafils for negative steps without dask * Specify in reason that only zarr is expected to fail * unify backend test negative_step with dask and without dask * Add comment on has_dask usage Co-authored-by: Alessandro Amici <a.amici@bopen.eu>
Is your feature request related to a problem? Please describe.
In #4309 we proposed to separate xarray - backend tasks, more or less in this way:
With the changes in open_dataset to support also zarr (#4187 ), we introduced a slightly different behavior for zarr chunking with respect the other backends.
Behavior of all the backends except zar
Zarr chunking behavior is very similar, but it has a different default when the user doesn't choose the size of the chunk along some dimensions, i.e.
Describe the solution you'd like
We could extend easily zarr behavior to all the backends (which, for now, don't use the field variable.encodings['chunks']):
if no chunks are defined in encoding, we use as default the dimension size, otherwise, we use the encoded chunks. So for now we are not going to change any external behavior, but if needed the other backends can use this interface.
I have some additional notes:
auto
is redundant because it has the same behavior as{}
, we could remove one of them.One last question:
overwrite_encoded_chunks
. Is it really needed? Why do we support to overwrite of the encoded chunks at readi time? This operation can be easily done after or at write time.The text was updated successfully, but these errors were encountered: