-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-node DataTree chunking #9634
Comments
Really cool to see you using xarray for radio astronomy data! I didn't know we had users in that field.
Good idea! We would be happy to take a PR if you want to generalize this.
I think we should avoid the temptation to make this overly clever, at least initially, because the |
Yes, this makes a lot of sense to me. Quite often dimension sizes will differ per node, so it does not make sense to use a single shared set of chunks. |
Yes, in principle I'd like to submit a PR. Apologies for not replying, I need to devote more time to thinking about the change: In particular, the Lines 859 to 864 in 863184d
Lines 896 to 901 in 863184d
which seems to imply that it's the backend's responsbility to interpret the Perhaps the full chunking schema/strategy could be passed to the Neither of the above seem appealing -- I'll try find some more time to think about this. |
I'm not sure I didn't miss anything, but I don't think The missing |
Is your feature request related to a problem?
In the radio astronomy domain specific xarray-ms, we construct a DataTree representing partitions of a legacy data format where each partition contains regular data cubes. As currently implemented, the custom backend supports a
partition_chunks
kwarg in theBackendEntrypoint.open_datatree
method so that it is possible to specify different chunking schemas per partition:The chunking specification above is specific to a radio astronomy legacy format, but it may be more generally useful to be able to specify per-DataTree node chunking.
Describe the solution you'd like
Currently,
BackendEntrypoint.open_datatree
passes it'schunks
kwarg to eachDataset
constructor in the DataTree. This is quite coarse-grained as it applies the same chunking schema to all Datasets in the DataTree.I propose that the
chunks
kwarg inBackendEntrypoint.open_datatree
support a chunking dictionary per path (i.e. DataTree Node). For example:Then, when constructing Datasets in the DataTree, the chunking schema appropriate to the node can be applied.
An entry in the above dictionary does not necessarily need to only apply to a single node. It could also apply the chunking schema to each subtree below the node. But it may be better to make this more explicit
Describe alternatives you've considered
We've implemented a custom
partition_chunks
kwarg argument in theBackendEntrypoint.open_datatree
method for our legacy data format.Additional context
No response
The text was updated successfully, but these errors were encountered: