You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Say, I have a zarr dataset with multiple variables Foo, Bar and Baz (and potentially, many more), there are 2 dimensions: x, y (potentially more). Say both Foo and Bar are large 2d arrays dims: x, y, Baz is relatively small 1d array dim: y. Say I would like to read that dataset with xarray but increase chunk from the native zarr chunk size for x and y but only for Foo and Bar, I would like to keep native chunking for Baz. afaiu currently I would do that with chunks parameter to open_dataset/open_zarr, but if I do do that via say dict(x=N, y=M) that will change chunking for all variables that use those dimensions, which isn't exactly what I need, I need those changed only for Foo and Bar. Is there a way to do that? Should that be part of the "harmonisation"? One could imagine that xarray could accept a dict of dict akin to {var: {dim: chunk_spec}} to specify chunking for specific variables.
Note that rechunk after reading is not what I want, I would like to specify chunking at read op.
This seems like a totally reasonable feature to add. The main tricky part would be figuring out the syntax, since we already use dictionaries like {dim: chunk_spec}. It's not obvious to me if a nested dict would mean {var: {dim: chunk_spec}} or {dim: {var: chunk_spec}}`. Perhaps we should try to come up with another, more explicit option?
Thought through a couple of options, including simple value classes, but in the end they did not fit the current API. If we try to stick with the current style, it makes a bit more sense to go in the direction of {dim: {var: chink_spec}} since there is already {dim: x}, so should a user want a variables specific chunking they would need to adjust it to {dim: {var: y, ...:x}}, .../Ellipsis standing for "all other variables" with dim. wdyt @shoyer?
Say, I have a zarr dataset with multiple variables
Foo
,Bar
andBaz
(and potentially, many more), there are 2 dimensions:x
,y
(potentially more). Say bothFoo
andBar
are large 2d arrays dims:x, y
,Baz
is relatively small 1d array dim:y
. Say I would like to read that dataset with xarray but increase chunk from the native zarr chunk size forx
andy
but only forFoo
andBar
, I would like to keep native chunking forBaz
. afaiu currently I would do that withchunks
parameter toopen_dataset
/open_zarr
, but if I do do that via saydict(x=N, y=M)
that will change chunking for all variables that use those dimensions, which isn't exactly what I need, I need those changed only forFoo
andBar
. Is there a way to do that? Should that be part of the "harmonisation"? One could imagine that xarray could accept a dict of dict akin to{var: {dim: chunk_spec}}
to specify chunking for specific variables.Note that
rechunk
after reading is not what I want, I would like to specify chunking at read op.Originally posted by @ravwojdyla in #4496 (comment)
The text was updated successfully, but these errors were encountered: