Managing Worker Memory #196
Replies: 2 comments
-
Upgrading Dask Gateway is the primary thing on our roadmap. Unfortunately, it is part of a larger rewrite @costrouc is doing to refactor the codebase to remove the helm dependency that has stalled because we got busy recently. Hopefully, we can get this done soon. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Closing due to time in an effort to clean up discussions. Feel free to re-open if needed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
TL;DR: The new async capabilities in fsspec and zarr are awesome but can create more demand for worker memory. Would be nice to specify worker memory when creating cluster (possible with current version of Dask Gateway)
Yesterday I upgraded the ESIP qhub on AWS (https://jupyter.qhub.esipfed.org) "pangeo" environment to use the latest s3fs and previously running Dask gateway workflows reading Zarr started dying with "killed worker" messages. In the experience of @martindurant, 80% of "killed worker" messages are due to package mismatches, and 20% due to memory issues, but I was using the latest packages from conda-forge so I took a closer look at the worker memory.
It turned out that because of the new ability to read chunks of data from Zarr asynchronously, the worker memory can spike if some of the chunk reads take a long time.
The solution was to boost the worker memory, which I did by updating the qhub config yaml, modifying my "Pangeo Worker" profile and relaunching my server. I think this is currently the only way to do this with qhub, as it's `not possible with this version of Dask Gateway to specify the worker memory when you create a cluster. Currently the only options are "environment" and "profile", right?
With the version of Dask Gateway we are using on the NASA Pangeo Access project (0.9) we are able to pass in
worker_memory
when we create the cluster, which of course would be super userful.@dharhas you said upgrading Dask Gateway was on the roadmap, so good! Looking forward to that!
Beta Was this translation helpful? Give feedback.
All reactions