You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the network connection between the worker and scheduler was broken, workers used to try to re-connect and negotiate their state with the scheduler.
It turned out that the logic around re-estabilshing the network connection (#5481), re-negotiating the state (#6341), and handling the disconnect on the scheduler side (#6354) was all buggy and a source of deadlocks. Though disruptive, for short-term stability, we opted to remove the reconnection option entirely (#6350).
However, in the long term, we do want workers to be resilient to temporary network failures. We'll want to add worker reconnection back in once contracts around BatchedSend and worker disconnection are tightened up.
Note that I'm intentionally not tracking this in #6384, since those are only meant to be short-term tasks. This is likely not something we'll tackle for a bit.
The text was updated successfully, but these errors were encountered:
If the network connection between the worker and scheduler was broken, workers used to try to re-connect and negotiate their state with the scheduler.
It turned out that the logic around re-estabilshing the network connection (#5481), re-negotiating the state (#6341), and handling the disconnect on the scheduler side (#6354) was all buggy and a source of deadlocks. Though disruptive, for short-term stability, we opted to remove the reconnection option entirely (#6350).
However, in the long term, we do want workers to be resilient to temporary network failures. We'll want to add worker reconnection back in once contracts around
BatchedSend
and worker disconnection are tightened up.Requires:
BatchedSend
and convert to asyncio #6389Note that I'm intentionally not tracking this in #6384, since those are only meant to be short-term tasks. This is likely not something we'll tackle for a bit.
The text was updated successfully, but these errors were encountered: