Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Cumulative aggregation, similar to Rolling #5215

Closed
max-sixty opened this issue Apr 24, 2021 · 6 comments · Fixed by #8512
Closed

Add an Cumulative aggregation, similar to Rolling #5215

max-sixty opened this issue Apr 24, 2021 · 6 comments · Fixed by #8512

Comments

@max-sixty
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Pandas has a .expanding aggregation, which is basically rolling with a full lookback. I often end up supplying rolling with the length of the dimension, and this is some nice sugar for that.

Describe the solution you'd like
Basically the same as pandas — a .expanding method that returns an Expanding class, which implements the same methods as a Rolling class.

Describe alternatives you've considered
Some options:
– This
– Don't add anything, the sugar isn't worth the additional API.
– Go full out and write specialized expanding algos — which will be faster since they don't have to keep track of the window. But not that much faster, likely not worth the effort.

@mathause
Copy link
Collaborator

I don't entirely get what the function is supposed to do (even after looking at the pandas docstring) - can you give a an example or two?

@dcherian
Copy link
Contributor

IIUC da.expanding(dim=dim).sum() is da.cumsum(dim) with support for min_periods and center like rolling.

I guess all expanding reductions are basically numpy.ufunc.accumulate(...). The dask versions will be "interesting" to write.
https://github.com/dask/dask/blob/f1f37cae96d5e98f8043ea430539a4fffbe62661/dask/array/reductions.py#L1389-L1413

Like @mathause I find "expanding" confusing. .accumulate().sum() or .cumulative().sum() sounds much better to me.

@max-sixty
Copy link
Collaborator Author

.cumulative is great! Much better.

The benefit is that the API surface is reduced — e.g. we can have a .cumulative().integrate(), rather than a separate .cumulative_integrate (and so on, for each aggregation), from #5153.

The implementation could be as simple as da.rolling(dim=da.sizes[dim]). How compatible would dask be with that? How does it compare to the numpy.ufunc.accumulate(...) suggestion?

@max-sixty max-sixty changed the title Add an Expanding aggregation, similar to Rolling Add an Cumulative aggregation, similar to Rolling Sep 20, 2023
@max-sixty
Copy link
Collaborator Author

I'd be up for doing .cumulative if I continue my recent contribution burst...

@mathause
Copy link
Collaborator

Go for it - we can hardly keep up with reviewing ;-)

max-sixty added a commit to max-sixty/xarray that referenced this issue Dec 2, 2023
@shoyer
Copy link
Member

shoyer commented Dec 8, 2023

I am pretty stoked about the idea of supporting cumulative integrals with the same syntax as other cumulative operations :)

max-sixty added a commit to max-sixty/xarray that referenced this issue Dec 8, 2023
max-sixty added a commit that referenced this issue Dec 8, 2023
* Add Cumulative aggregation

Closes #5215

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* whatsnew

* Update xarray/core/dataarray.py

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

* Update xarray/core/dataset.py

* min_periods defaults to 1

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants