Best practice for applying function over 3D rolling windows Xarray with Dask backend #7508
-
Apply custom function over 3D rolling window with dask backend taking extremely long time, using very little CPU and a lot of RAMI'm attempting to run a 3D rolling window operation on a 3D DataArray (time, lat, lon) with a custom function operating on each of the rolling windows. An example can be seen below:
The function essentially outputs a single value at the central lat, lon point of the rolling window, hence I'm reducing over the window dimensions. I've tried the rolling().reduce() combination with little success, but could be doing something wrong there. I'm running this on a dask LocalCluster and I'm seeing very low CPU usage from the workers (<10%) and high disk read/write stats from them also, suggesting a lot of reordering of data. The processing of this array takes a very long time even when running on a large cluster and memory usage and runs more efficiently (and eventually completes) when the input array is a pure numpy array rather than dask. Eventually the code produces the following error:
I'd like to leverage dask's parallelisation for this task to speed it up. Has anyone else had this problem and has overcome it? Would love some suggestions! Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Your chunk sizes along the rolling dimension are small, so there's a lot of communication. Try increasing them if you can. |
Beta Was this translation helpful? Give feedback.
Your chunk sizes along the rolling dimension are small, so there's a lot of communication. Try increasing them if you can.