-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast weighted sum #1224
Comments
Interesting -- thanks for sharing! I am interested in performance improvements but also a little reluctant to add in specialized optimizations directly into xarray. You write that this is equivalent to Using vectorized operations feels a bit more idiomatic (though also maybe more verbose). It also may be more performant. Note that the builtin
In contrast, There have also been discussion in #422 about adding a dedicated method for weighted mean. |
I did not try dask.array.sum() - worth some playing with. |
Was concat slow at graph construction or compute time?
…On Sun, Jan 22, 2017 at 6:02 PM crusaderky ***@***.***> wrote:
(arrays * weights).sum('stacked') was my first attempt. It performed
considerably worse than sum(a * w for a, w in zip(arrays, weights)) -
mostly because xarray.concat() is not terribly performant (I did not look
deeper into it).
I did not try dask.array.sum() - worth some playing with.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1224 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1oZrmY8hgglb3RBTcDcFhcLhs8Lbks5rVAoggaJpZM4LqkHo>
.
|
Both. One of the biggest problem is that the data of my interestest is a mix of
Even keeping the two lots separate (which is fastwsum does) performed considerably slower. However, this was over a year ago and much before xarray.dot() and dask.einsum(), so I'll need to tinker with it again. |
Retiring this as it is way too specialized for the main xarray library. |
In my project I'm struggling with weighted sums of 2000-4000 dask-based xarrays. The time to reach the final dask-based array, the size of the final dask dict, and the time to compute the actual result are horrendous.
So I wrote the below which - as laborious as it may look - gives a performance boost nothing short of miraculous. At the bottom you'll find some benchmarks as well.
https://gist.github.com/crusaderky/62832a5ffc72ccb3e0954021b0996fdf
In my project, this deflated the size of the final dask dict from 5.2 million keys to 3.3 million and cut a 30% from the time required to define it.
I think it's generic enough to be a good addition to the core xarray module. Impressions?
The text was updated successfully, but these errors were encountered: