Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: inconsistency with idle-disconnect-request and recovery task creation #4058

Open
colinthomas-z80 opened this issue Feb 7, 2025 · 2 comments
Labels

Comments

@colinthomas-z80
Copy link
Contributor

TaskVine workers are started with an idle-disconnect parameter, where if they are connected to the manager but do not receive work within x seconds it will perform an idle-disconnect-request.

The manager is supposed to consider the worker and its files. If the worker has a temporary file in local storage that is needed by a future task the manager should refuse the request, as allowing it to disconnect would cause the file to be lost.

However, when running a workflow I found a worker requesting to idle-disconnect, which the manager allowed, yet a recovery task was immediately created for the lost file. Described by the section of the log attached.

Therefore, either we should not have allowed this disconnect to occur and there is a flaw in handle_worker_timeout , or the file lost was not needed and we should not have created a recovery task for it.

Image

@colinthomas-z80 colinthomas-z80 added good-first-issue A simpler issue that would be a good starting point for a new contributor TaskVine and removed good-first-issue A simpler issue that would be a good starting point for a new contributor labels Feb 7, 2025
@btovar
Copy link
Member

btovar commented Feb 7, 2025

Given recent experiences with dynamic map reduce, I would prefer if the disconnect requests are ignored if the worker has a unique copy of a temporary file, regardless of whether the manager knows if the file would be needed in the future. The file may be needed by a task that has not been created yet.

@dthain
Copy link
Member

dthain commented Feb 7, 2025

The code to do exactly that appears to be present, but Colin's example shows that it may not be working as intended...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants