-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support specifying chunk sizes using labels (e.g. frequency string) #7559
Comments
The On the API question is there anywhere else in xarray where we have made some choice about how to let the user choose between specifying via indexes or labels? Apart from just |
I explored this idea in this tutorial I think it may be a fundamental concept for labelled array analysis. You need to pick whether you're working in "index space" like unlabelled arrays, or in "label space". This also came up in this issue where Another example: Alignment is in "label space", broadcasting seems like "index space" (you just change shapes, but it does use dimension names to do that so maybe 50/50). |
Now I think the way to generalize is to eventually support I think overloading the existing I put up #9109 which allows specifying frequency strings. |
Responding to @shoyer's comment:
The table here doesn't seem to overlap with I see at least two ways to proceed with more explicit API:
|
I like this option. |
* Support rechunking to a frequency. Closes #7559 * Updates * Fix typing * More typing fixes. * Switch to TimeResampler objects * small fix * Add whats-new * More test * fix docs * fix * Update doc/user-guide/dask.rst Co-authored-by: Spencer Clark <spencerkclark@gmail.com> --------- Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
Is your feature request related to a problem?
dask.dataframe
supports repartitioning or rechunking using a frequency string (freq
kwarg).I think this would be a useful addition to
.chunk
. It would help with some groupby problems (as suggested in this comment) and generally make a few problems amenable to blockwise/map_blocks solutions.Describe the solution you'd like
.chunk(lon=5, time="MS")
. There is some ugliness in that this syntax mixes up integer index values (lon=5
) and a label-based frequency stringtime="MS"
chunk_by_labels
would be useful wherechunk_by_labels(lon=5, time="MS")
would rechunk the data so that a single chunk contains 5° of longitude points and a month of time. Alternative this could be.chunk(lon=5, time="MS", by="labels")
Describe alternatives you've considered
Have the user do this manually but that's kind of annoying, and a bit advanced.
Additional context
No response
The text was updated successfully, but these errors were encountered: