You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is basically the vorticity example from dask/distributed#6560 (comment). (Ideally downsized a bit, so it's faster and cheaper to run frequently.)
Add rechunks before the result operation for bonus complexity. (Note that the data and chunk sizes are just representative here and would need to be set to something larger than the cluster.)
This is expected to perform poorly right now (use a ton of memory, spill a ton to disk, and be really slow) because of:
I used the original vorticity example rather than the further-simplified one here, but I think they're equivalent in terms of the problems they exercise.
One big difference is that I did wait(arr_to_devnull(result)) instead of wait(result). In @TomNicholas's original workload, he wanted to write the array to zarr (I think right)? In my benchmarks in dask/distributed#6560 (comment), the "something else is going on here" thing that confused me was that in all cases, memory just went 📈, and ended at the same level in all cases.
I realized later that I was persisting the entire result, so dask had to keep all the output in memory, so of course it used a lot of memory. In #243 I instead simulated writing it to zarr, which theoretically should be able to work with a much-larger-than-memory array, but didn't due to all the problems listed.
dask/distributed#6571
I believe this could be simplified to something like
This is basically the
vorticity
example from dask/distributed#6560 (comment). (Ideally downsized a bit, so it's faster and cheaper to run frequently.)Add
rechunk
s before theresult
operation for bonus complexity. (Note that the data and chunk sizes are just representative here and would need to be set to something larger than the cluster.)This is expected to perform poorly right now (use a ton of memory, spill a ton to disk, and be really slow) because of:
a + b
dask/distributed#6597As we work on fixes to those above issues in the future, we should hopefully see performance improve.
The text was updated successfully, but these errors were encountered: