-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart given worker(s) using client [help wanted] #1823
Comments
No, there is not currently an easy way to do this. |
Is it possible to restart the scheduler? One thing I noticed is that if you have published datasets and then call client.restart, the datasets are still published in the scheduler. If another client then asks for the result from one of those datasets, the scheduler does not reissue the work to a worker. It seems like client.restart should also restart the scheduler or at least unpublish all datasets. |
Unpublishing all datasets when performing a restart would be easy to implement, e.g., by adding a |
I recommend raising the published datasets topic as a separate issue. |
Add retire_workers looks like helps shutting down workers without removing them completely. But is there a way to start these workers again using an api. |
You're right. That PR doesn't solve your issue. |
Any ideas or suggestions about this issue. I want to use worker restart to refresh local packages cache. Is it makes sense to make PR for this. |
I would like to restart the worker after every task is run as there seems to be a memory leak with dask and my tasks (which does not occur when the task is run locally). any ideas - or should I make a new PR. |
I came across this requirement as well: I solved it with the below. This assumes your dask-worker(s) are overseen by nanny managers. This will throw a
|
I'm investigating this as a potential solution to #391 (comment), in which we occasionally see unresponsive workers during network-heavy aggregation operations. Looking through the existing restart code, it appears that a something akin to "retire workers" could be implemented that first distributed/distributed/scheduler.py Lines 5446 to 5462 in 1297b18
Then issues a command to the nanny (if present) to restart the worker process: @mrocklin, Would this make sense as an additional scheduler api? |
I have looked at retire_workers in the past and I would agree that it makes sense in the restart-particular-worker scenario. However, if the worker is unresponsive, what is the hope to be able to copy its data? |
Tried the retire_workers api using cl.retire_workers and with close_workers=False nothing happens and if default setting the worker completely shutsdown but doesnt come up again. Any other way to restart a particular worker, may be via nanny or some other method. |
Current client.restart() restarts all workers and entire cluster. Is there a way to restart or perform stop, start, restart operation on single / list of workers using the client apis.
This will allow cleanup of specific workers only without affecting others where additional tasks may be running.
The text was updated successfully, but these errors were encountered: