-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zombie tasks able to acquire locks after failure #1715
Labels
Comments
this is against 0.8.0 |
Spoke with @drcrallen and @nishantmonu51 about this- we thought it would make sense for the TaskLockbox to have a Set of active task ids that it is the responsibility of the TaskQueue to keep up to date. Then the TaskLockbox could reject lock requests for tasks that are not active. The TaskQueue already has a reference to the lockbox so this is perhaps not too much of a stretch. |
Merged
nishantmonu51
added a commit
to metamx/druid
that referenced
this issue
Sep 24, 2015
fixes apache#1715 - TaskLockBox has a set of active tasks - lock requests throws exception for if they are from a task not in active task set. - TaskQueue is responsible for updating the active task set on tasklockbox fix apache#1715 fixes apache#1715 - TaskLockBox has a set of active tasks - lock requests throws exception for if they are from a task not in active task set. - TaskQueue is responsible for updating the active task set on tasklockbox review comment remove duplicate line use ISE instead organise imports
drcrallen
added a commit
that referenced
this issue
Sep 29, 2015
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We had an issue where disconnect somewhere occurred long enough for the overlord to log a task as failed, but the task was still running. After 45 mins or so the task submitted a lock acquire request and it was granted, but since the task had already been failed, it never was properly cleaned up.
These are from the overlord
Suffice to say, this is a long-running task that acquires multiple 1-hr locks during the course of its execution.
The text was updated successfully, but these errors were encountered: