-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Priority based task locking #1679
Conversation
@@ -31,19 +31,24 @@ | |||
private final String dataSource; | |||
private final Interval interval; | |||
private final String version; | |||
private final Integer priority; | |||
private boolean exclusiveLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be volatile ?
I would prefer making it part of constructor arg ?
also I think serde should preserve the status whether the lock was exclusive or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes making it volatile makes sense...serde is preserving the exclusive lock state, I can make it more explicit by putting @JsonProperty
annotation on setExclusiveLock()
. Making it part of constructor can help in testing. Is there any other reason for your preference ? Anyways the tasks have to use LockUpgradeAction to set it to true.
#1604 adds a taskContext, |
@nishantmonu51 FWIW, I was skeptical about having priority coming from input json as I couldn't think of any case where user should mess around with task priority. |
@nishantmonu51 I guess it can be done. How do you want to collaborate on it ? Should I wait for your PR to get merged ? |
I think it's ok to let people set their own priorities. I don't think it's necessary but I think it's ok if we want to go that way. About exclusivity- I don't see why we need non-exclusive locks. Why not invalidate all lower priority locks when a higher priority request comes in? There still needs to be some method that upgrades a lock to non-preemptible, but I think it would be okay to have them always be exclusive. That should make life simpler in the lockbox. Btw, I get confused talking about "higher" and "lower" priority when smaller-numbered priorities are higher… |
@gianm LockUpgradeAction is the mechanism using which lock can become non-preemptible or "exclusive". And yes when a higher priority task comes in and all the other conflicting tasks have lower priority non-exclusive(non-preemptive) locks then all the lower priority locks will be revoked, see this. Thus, the lower priority tasks will fail eventually. So, if I understood your comment correctly it is doing exactly the same thing what you have mentioned. Am I missing something ? |
@pjain1 Ok, I see. I was probably just confused by the names then. In my mind all locks in this PR are "exclusive" (in that there will never be two different shared locks for an interval) but some are preemptible and some are not. So IMO "non-preemptible" or "uninterruptible" is a better word than "exclusive". It's also kind of confusing that "canAcquireLock" actually makes changes to the locks. IMO either that method should not make changes, or it can make changes but then it should have a different name. For the context stuff, I don't have a strong preference. I think it's ok to let people set their own priorities. I don't think it's necessary but I think it's ok if we want to go that way. It's also something that could be added later, because tasks could have a default priority that is overridden by the json. |
@gianm I took care of your comments. I have a question though, the integration test that I wrote submits 3 Index task with different priorities in order such that only last one with highest priority succeeds as the last one will revoke the tasklocks of first two. However, sometimes after the last task succeeds, ExecutorLifeCycle for one of the first two tasks will start and it calls |
The |
I have removed integration test from this PR for now |
@gianm I thought and discussed with @himanshug about the integration test. Actually it will be hard to write a fully deterministic integration test for lock overriding unless changes are made to the Druid code, which is not worth it. So I will just skip the integration test for now. Anyways I am still curious to know about call to |
@pjain1 The comment about "local mode" means just running a peon by itself, with no overlord and no middle manager. At one point we thought that was something that would be good to make possible, and we used it somewhat often for testing things. I think it is required for that, since nothing else is going to call isReady in that mode (there's no overlord and no task queue). IMO since there is no real guarantee about what order tasks will attempt to run in, the sequencing you described is actually totally fine as long as "acquire the lock again" means the lower priority task actually got a new lock with a higher version than the previously-run higher-priority task. The lock versions absolutely must be increasing over time as tasks run, even if they don't run in the order in which they were originally submitted. |
@gianm Ah...I see that makes sense. Yes the newly acquired lock will have higher version. |
Found some problems in handling error scenarios when overlord restarts or leadership changes. Closing the PR, will reopen once done. |
@pjain1 Task priority was addressed in #984 as well to be used as part of threading priority. I went back and forth with @xvrl a few times on what priority means, and here's what we originally came up with:
As such, I think there are two issues here. One is about preemption at all, and the other is about preemption rules. As per #1513 preemption at all seems to be something that is desired. So the question that is outstanding is how to define rules for preemption. There are a few other task engines that have "can I be preempted" and "what is my priority" as two independent constructs, and I think that would make sense to consider here. In such a case, a lock can only be pre-empted if it is set to be ABLE to be preempted, and if the preempting task has a higher priority on the preemption scale. If a task is NOT flagged as preempt-able, OR is flagged as preempt-able but the other task is at a lower rank on the priority scale, then no preemption would occur. If two tasks are vying for resources (be it a lock or an execution slot in the cluster) it would be nice if we had a unified "priority" that was independent of preemption. |
@drcrallen Just to recap, right now "can I be preempted" is a function of priority. So a task is preempt-able if higher priority task comes in except when the task is working in a critical section and has upgraded its lock to be non-preempt-able. |
@pjain1 to maintain current behavior, the default would be that a task is not preempt-able unless specified that it can be. |
@drcrallen changing the notion of what is high/low priority sounds good. did you want "can i be prempted" check only for things to be backwards compatible? f that is the case then we can merge this PR in druid-0.9 only or do you believe tasks should always be non-preemptable by default? |
@himanshug (thinking out loud here) it seems to me that for a task execution service to take all reasonable steps necessary to ensure a task
is a reasonable thing for an executor service to do. As such it is not immediately obvious to me that the actual preemption should be part of the executing service as opposed to part of some external monitoring and coordination, whereas task priority under general resource contention (number 1 from the list) makes sense to have as part of the task executor. For example, I can see a scenario where the coordinator evaluates running tasks, kills inferior priority locked tasks, then requests the newer higher priority tasks be run, but operates independently of the actual executing service. In this scenario the coordinator can either 1. Requeue the task or 2. Fail the task and submit a new one later. I still think "can I be preempted" is separate concept from general task priority because lock preemption is a very specific type of cluster resource contention. Does that make sense? |
In another scenario imagine you have a limited set of cluster resources as workers, and some set of tasks that are running and more that need to be run. Should preemption based on arbitrary cluster resources follow the same rules as lock preemption? Since we operate on Lambda architecture, it makes sense to me that you can have a hadoop task that is SLA critical, and thus should not be terminated, but you don't want preempting realtime tasks. In such a case I think it would make sense for new real-time tasks to not preempt the SLA critical hadoop tasks, or at least to have a way to specify that the task is not preempt-able under normal preemption conditions. |
@drcrallen for the specific case of SLA critical hadoop task, it is allowed for user to submit it with high enough priority(by configuring priority in submitted json) so that realtime tasks could not preempt it. That said, just to correct my understanding, is this what you are proposing?
|
we had a bit of chat with @cheddar as well over this and it appears existing way is ok and we believe even priority shouldn't be user configurable (that is realtime task will always preempt batch indexing task). |
Reopening the PR -
|
lockReleaseCondition.await(); | ||
} | ||
|
||
tasksWaitingForLock.remove(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in a try/finally
? It's possible for lockReleaseCondition.await()
or tryLock
to throw exceptions while a task is waiting, and in that case tasksWaitingForLock
won't get updated. I think the task should still get removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are correct...moved it to finally
@pjain1 just to follow up here, did we ever decide on how to create an enable/disable flag for this feature? |
@fjy sorry I have been busy with other things I will think about it, discuss with others and will update the thread. This PR does not need to be a blocker for 0.9.2. |
Priority locking feature is configurable now, by default it is off and can be enabled by setting the runtime property
|
- Task priority is used to acquire a lock on an interval for a datasource. - Tasks with higher priority can preempt lower-priority tasks for the same datasource and interval if ran concurrently.
Transient failure -
restarting the build |
@pjain1 @himanshug what is the status of this PR? |
@pjain1 @himanshug do you all still need/want this? Should we direct attention back here for 0.9.3? |
Revived and implemented as part of #4550. |
This PR corresponds to the issue #1513
Design details -
The flow for acquiring and upgrading Locks by a Task during TaskLifeCycle would be like -
Tasks with no priority specified will have the respective default priorities as per the task type
For example, if a Hadoop Index task is running and a Realtime Index task starts that wants to publish a segment for the same (or overlapping) interval for the same datasource, then it will override the task locks of the Hadoop Index task. Consequently, the Hadoop Index task will fail before publishing the segment.
Note - There is no need to set this property, task automatically gets a default priority as per its type. However, if one wants to override the default priority it can be done by setting
lockPriority
insidecontext
property like this -Major Implementation details -
Possible Future Enhancements - Proactively shutdown the tasks instead of waiting for them to fail eventually since their TaskLock has been revoked.
Edit 08/05/2016 -
Priority locking feature is configurable now, by default it is off and can be enabled by setting the runtime property
druid.indexer.taskLockboxVersion
tov2
.TaskLockbox
in an interface and there are two implementations -TaskLockboxV1
which is same as the previousTaskLockbox
andTaskLockboxV2
does priority based locking. One of them is injected at runtime inCliOverlord
andCliPeon
depending ondruid.indexer.taskLockboxVersion
TaskLockbox
has new methodboolean setTaskLockCriticalState(Task task, Interval interval, TaskLockCriticalState taskLockCriticalState)
meant for upgrading locks in case priority based locking is used.TaskLockboxV1
always returnstrue
for this method.TaskLock
has two new fieldspriority
andupgraded
. In case ofTaskLockboxV1
the corresponding values are always0
andtrue
. ForTaskLockboxV2
it is depended on the task.setTaskLockCriticalState
before publishing segments. However, in case ofTaskLockboxV1
the method always returnstrue
and extra overhead is just an HTTP call to overlord. In case ofTaskLockboxV2
it does the actual work of setting TaskLock state.Task
interface hasint getLockPriority()
method which I guess is OK