Allow chunk spec per variable #4623

ravwojdyla · 2020-11-30T10:56:39Z

Say, I have a zarr dataset with multiple variables Foo, Bar and Baz (and potentially, many more), there are 2 dimensions: x, y (potentially more). Say both Foo and Bar are large 2d arrays dims: x, y, Baz is relatively small 1d array dim: y. Say I would like to read that dataset with xarray but increase chunk from the native zarr chunk size for x and y but only for Foo and Bar, I would like to keep native chunking for Baz. afaiu currently I would do that with chunks parameter to open_dataset/open_zarr, but if I do do that via say dict(x=N, y=M) that will change chunking for all variables that use those dimensions, which isn't exactly what I need, I need those changed only for Foo and Bar. Is there a way to do that? Should that be part of the "harmonisation"? One could imagine that xarray could accept a dict of dict akin to {var: {dim: chunk_spec}} to specify chunking for specific variables.

Note that rechunk after reading is not what I want, I would like to specify chunking at read op.

Originally posted by @ravwojdyla in #4496 (comment)

The text was updated successfully, but these errors were encountered:

shoyer · 2020-12-17T16:28:10Z

This seems like a totally reasonable feature to add. The main tricky part would be figuring out the syntax, since we already use dictionaries like {dim: chunk_spec}. It's not obvious to me if a nested dict would mean {var: {dim: chunk_spec}} or {dim: {var: chunk_spec}}`. Perhaps we should try to come up with another, more explicit option?

ravwojdyla · 2020-12-18T12:12:04Z

Thought through a couple of options, including simple value classes, but in the end they did not fit the current API. If we try to stick with the current style, it makes a bit more sense to go in the direction of {dim: {var: chink_spec}} since there is already {dim: x}, so should a user want a variables specific chunking they would need to adjust it to {dim: {var: y, ...:x}}, .../Ellipsis standing for "all other variables" with dim. wdyt @shoyer?

keewis · 2020-12-19T17:17:22Z

we could also allow special cases: {dim: x, (dim, var): y}, where dim: x has the same effect as (dim,): x

shoyer added the topic-backends label Dec 17, 2020

aurghs mentioned this issue Dec 27, 2020

Error when rechunking from Zarr store #4380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow chunk spec per variable #4623

Allow chunk spec per variable #4623

ravwojdyla commented Nov 30, 2020

shoyer commented Dec 17, 2020

ravwojdyla commented Dec 18, 2020 •

edited

Loading

keewis commented Dec 19, 2020

Allow chunk spec per variable #4623

Allow chunk spec per variable #4623

Comments

ravwojdyla commented Nov 30, 2020

shoyer commented Dec 17, 2020

ravwojdyla commented Dec 18, 2020 • edited Loading

keewis commented Dec 19, 2020

ravwojdyla commented Dec 18, 2020 •

edited

Loading