[SPARK-27112][CORE] : Create a resource ordering between threads to resolve the deadlocks encountered … #24072

pgandhi999 · 2019-03-12T19:27:27Z

…when trying to kill executors either due to dynamic allocation or blacklisting

What changes were proposed in this pull request?

There are two deadlocks as a result of the interplay between three different threads:

task-result-getter thread

spark-dynamic-executor-allocation thread

dispatcher-event-loop thread(makeOffers())

The fix ensures ordering synchronization constraint by acquiring lock on TaskSchedulerImpl before acquiring lock on CoarseGrainedSchedulerBackend in makeOffers() as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks.

How was this patch tested?

Manual Tests

…when trying to kill executors either due to dynamic allocation or blacklisting Ordered synchronization constraint by acquiring lock on Task Scheduler before acquiring lock on CoarseGrainedSchedulerBackend

pgandhi999 · 2019-03-12T19:31:21Z

ok to test

attilapiros · 2019-03-12T20:02:18Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

      force: Boolean): Seq[String] = {
    logInfo(s"Requesting to kill executor(s) ${executorIds.mkString(", ")}")

+    val idleExecutorIds = executorIds.filter { id => force || !scheduler.isExecutorBusy(id) }


Nit: I would not use the name idleExecutorIds for this variable as when flag force is true then not only idle executors are contained.

meh, I'm not sure what else you'd call it ... there is already executorsToKill lower down ... unless you have a better suggestion, idleExecutorIds is probably good enough

but this does leave a small race for SPARK-19757, doesn't it? After this executes, then an executor gets a task scheduled on it so its no longer idle, but you still kill it below? To really prevent that, you'd need to get both locks (in the same order of course) so

val response = scheduler.synchronized { this.synchronized {

it also isn't the worst thing in the world if we occasionally kill an executor which just got a task scheduled on it.

@squito If you wanted to prevent that race, then you need something like:

val response = scheduler.synchronized { val idleExecutorIds = executorIds.filter { id => force || !scheduler.isExecutorBusy(id) } this.synchronized { ... } }

right (so the lookup inside the scheduler lock)?

it also isn't the worst thing in the world if we occasionally kill an executor which just got a task scheduled on it.

So we don't count this as a task failure right? Not sure where to look to verify that.

yes, I meant with the filter happening inside both locks -- more like it was before the current form of the PR, or as you suggested

I have a suggestion for naming but I do not insist on that:

renaming idleExecutorIds to executorsToKill

renaming the old executorsToKill to knownExecutorsToKill

I also have checked the synchronised blocks of CoarseGrainedSchedulerBackend and its derived classes and have not found any other place where the scheduler is used for locking (within the synchronised block).

squito

I think this makes sense, but there are more instance of the order inversion in the locks. I noticed at least CoarseGrainedSchedulerBackend.disableExecutor() also reverses the lock order.

squito · 2019-03-12T21:29:54Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

      force: Boolean): Seq[String] = {
    logInfo(s"Requesting to kill executor(s) ${executorIds.mkString(", ")}")

+    val idleExecutorIds = executorIds.filter { id => force || !scheduler.isExecutorBusy(id) }


meh, I'm not sure what else you'd call it ... there is already executorsToKill lower down ... unless you have a better suggestion, idleExecutorIds is probably good enough

but this does leave a small race for SPARK-19757, doesn't it? After this executes, then an executor gets a task scheduled on it so its no longer idle, but you still kill it below? To really prevent that, you'd need to get both locks (in the same order of course) so

val response = scheduler.synchronized { this.synchronized {

it also isn't the worst thing in the world if we occasionally kill an executor which just got a task scheduled on it.

squito · 2019-03-12T21:32:21Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

-              Some(executorData.executorAddress.hostPort))
-        }.toIndexedSeq
-        scheduler.resourceOffers(workOffers)
+      val taskDescs = scheduler.synchronized {


there should be a comment here about why we need both of these locks.

abellina · 2019-03-12T22:29:24Z

@squito the CoarseGrainedSchedulerBackend.disableExecutor(), I don't think it inverts the order. the scheduler.executorLost call happens outside of the CoarseGrainedSchedulerBackend lock, and it gets called from a driver event loop. Let me know if I've missed something.

vanzin · 2019-03-12T23:01:32Z

Your PR title and description are basically copies of the bug. Could you instead describe the change?

SparkQA · 2019-03-13T00:43:14Z

Test build #103386 has finished for PR 24072 at commit e649900.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2019-03-13T01:20:00Z

@abellina you're right about disableExecutors, thanks for taking a closer look, sorry that was just from a really quick scan. But we should be sure to take a close look at all places the lock is used.

pgandhi999 · 2019-03-13T14:47:37Z

but this does leave a small race for SPARK-19757, doesn't it? After this executes, then an executor gets a task scheduled on it so its no longer idle, but you still kill it below? To really prevent that, you'd need to get both locks (in the same order of course) so

val response = scheduler.synchronized { this.synchronized {

@squito I did think about this yesterday and tried it out as well; the deadlock issue gets fixed alongwith the race, but I was not sure whether doing this may or may not cause a performance degradation as a bunch of threads might end up busy waiting a lot of time. I can do some perf tests with the above change and if it looks good, update the PR with the fix. Will let you know. Thank you.

dhruve · 2019-03-13T15:11:40Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

      }

      // If an executor is already pending to be removed, do not kill it again (SPARK-9795)
      // If this executor is busy, do not kill it unless we are told to force kill it (SPARK-9552)


Remove this comment and add one where we are doing the force check.

Locking the code block in killExecutors() method with TaskSchedulerImpl followed by CoarseGrainedSchedulerBackend to avoid race condition issue and adding comments.

SparkQA · 2019-03-13T20:37:57Z

Test build #103453 has started for PR 24072 at commit ed12daf.

vanzin · 2019-03-13T20:40:42Z

Sorry to be a pain about this, but please remove the bug stuff from the PR description. If we want details about the bug, we can look at, ahem, the bug. Focus on describing what the change does and why it fixes the problem.

vanzin · 2019-03-13T20:50:16Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+      // SPARK-27112: We need to ensure that there is ordering of lock acquisition
+      // between TaskSchedulerImpl and CoarseGrainedSchedulerBackend objects in order to fix
+      // the deadlock issue exposed in SPARK-27112
+      val taskDescs = scheduler.synchronized {


I took a quick look at the code that calls this, and I'm wondering if holding the two locks here is really needed.

For context, all this code is inside the RPC endpoint handler. This is a ThreadSafeRpcEndpoint so there's only one message being processed at a time, meaning that you won't have multiple threads calling makeOffers concurrently.

So it seems to me that it would be possible to:

with the CoarseGrainedSchedulerBackend.this lock held, calculate the works offers.

val workOffers = CoarseGrainedSchedulerBackend.this.synchronized { ... }

With the scheduler lock held, make the offers:

val taskDesc = scheduler.synchronized { scheduler.resourceOffers(workOffers) }

And as far as I understand that should work and also be easier to understand, right?

I also noticed that later this code calls launchTasks, and that method accesses and modifies data in executorDataMap without the CoarseGrainedSchedulerBackend.this lock, which is very sketchy.

Ok, seems both locks are needed because of SPARK-19757. But the launchTasks issue is still there.

@vanzin So in the code, I came across the following comment, wonder if that answers the launchTasks issue. I exactly do not understand the intention of the comment though.

// Accessing `executorDataMap` in `DriverEndpoint.receive/receiveAndReply` doesn't need any // protection. But accessing `executorDataMap` out of `DriverEndpoint.receive/receiveAndReply` // must be protected by `CoarseGrainedSchedulerBackend.this`. Besides, `executorDataMap` should // only be modified in `DriverEndpoint.receive/receiveAndReply` with protection by // `CoarseGrainedSchedulerBackend.this`. private val executorDataMap = new HashMap[String, ExecutorData]

Ok, I think that's fine. I checked and all modifications happen on the endpoint thread, so reading the map from that thread without a lock should be fine. The data being modified (freeCores) is also only used in the endpoint thread, so that looks safe too.

vanzin · 2019-03-13T20:59:48Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+        // If this executor is busy, do not kill it unless we are told to force kill it (SPARK-9552)
+        val executorsToKill = knownExecutors
+          .filter { id => !executorsPendingToRemove.contains(id) }
+          .filter { id => force || !scheduler.isExecutorBusy(id) }


In a similar vein to my previous comment, although I'm less sure about this one.

This seems to be the only interaction with the scheduler in this method, so could this filtering be done first thing in the method, with the scheduler lock held, and then the rest of the code just needs the CoarseGrainedSchedulerBackend lock?

It seems to me the behavior wouldn't change from the current state (where the internal scheduler state can change while this method is running). And as in the other case, easier to understand things when you're only holding one lock.

(Caught up with the previous discussion and it seems that here both locks are needed to avoid an edge case where you could kill active executors.)

vanzin · 2019-03-14T00:32:23Z

BTW if the "two locks need to be held" thing is really needed in multiple places, might be good to have a helper function, e.g.

def withLock[T](fn: => T): T = lock1.synchronized { lock2.synchronized { fn } }

pgandhi999 · 2019-03-14T21:35:15Z

@squito @vanzin @attilapiros @abellina Have worked on all the comments and pushed the respective changes.

vanzin · 2019-03-14T21:48:19Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

    private def makeOffers() {
      // Make sure no executor is killed while some task is launching on it
-      val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized {
+      // SPARK-27112: We need to ensure that there is ordering of lock acquisition


This comment would be great in the withLock function, instead of being copy & pasted in a few places.

vanzin · 2019-03-14T21:48:32Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+      // SPARK-27112: We need to ensure that there is ordering of lock acquisition
+      // between TaskSchedulerImpl and CoarseGrainedSchedulerBackend objects in order to fix
+      // the deadlock issue exposed in SPARK-27112
+      val taskDescs = withLock({


No need for the parentheses.

Sorry for the extra commits, was fixing code indentation.

SparkQA · 2019-03-15T02:04:05Z

Test build #103518 has finished for PR 24072 at commit 47448b7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-15T02:16:25Z

Test build #103514 has finished for PR 24072 at commit 2b4f226.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-15T02:52:56Z

Test build #103519 has finished for PR 24072 at commit 09f9b47.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

attilapiros · 2019-03-15T15:18:02Z

LGTM

squito · 2019-03-15T18:54:43Z

lgtm

abellina

👍

dhruve · 2019-03-18T15:13:13Z

+1
@squito @vanzin Can we merge this PR in. Thanks.

squito · 2019-03-18T15:35:53Z

merged to master.

@pgandhi999 there was a merge conflict against branch-2.4, would you mind opening another PR against that branch?

pgandhi999 · 2019-03-18T16:03:38Z

Sure @squito will do that. Thank you.

…esolve the deadlocks encountered … …when trying to kill executors either due to dynamic allocation or blacklisting There are two deadlocks as a result of the interplay between three different threads: **task-result-getter thread** **spark-dynamic-executor-allocation thread** **dispatcher-event-loop thread(makeOffers())** The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks. Manual Tests Closes apache#24072 from pgandhi999/SPARK-27112-2. Authored-by: pgandhi <pgandhi@verizonmedia.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>

…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting Closes #24072 from pgandhi999/SPARK-27112-2. Authored-by: pgandhi <pgandhiverizonmedia.com> Signed-off-by: Imran Rashid <irashidcloudera.com> ## What changes were proposed in this pull request? There are two deadlocks as a result of the interplay between three different threads: **task-result-getter thread** **spark-dynamic-executor-allocation thread** **dispatcher-event-loop thread(makeOffers())** The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks. ## How was this patch tested? Manual Tests Closes #24134 from pgandhi999/branch-2.4-SPARK-27112. Authored-by: pgandhi <pgandhi@verizonmedia.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>

…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting Closes #24072 from pgandhi999/SPARK-27112-2. Authored-by: pgandhi <pgandhiverizonmedia.com> Signed-off-by: Imran Rashid <irashidcloudera.com> ## What changes were proposed in this pull request? There are two deadlocks as a result of the interplay between three different threads: **task-result-getter thread** **spark-dynamic-executor-allocation thread** **dispatcher-event-loop thread(makeOffers())** The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks. ## How was this patch tested? Manual Tests Closes #24134 from pgandhi999/branch-2.4-SPARK-27112. Authored-by: pgandhi <pgandhi@verizonmedia.com> Signed-off-by: Imran Rashid <irashid@cloudera.com> (cherry picked from commit 95e73b3) Signed-off-by: Imran Rashid <irashid@cloudera.com>

…esolve the deadlocks encountered when trying to kill executors either due to dynamic allocation or blacklisting Closes apache#24072 from pgandhi999/SPARK-27112-2. Authored-by: pgandhi <pgandhiverizonmedia.com> Signed-off-by: Imran Rashid <irashidcloudera.com> ## What changes were proposed in this pull request? There are two deadlocks as a result of the interplay between three different threads: **task-result-getter thread** **spark-dynamic-executor-allocation thread** **dispatcher-event-loop thread(makeOffers())** The fix ensures ordering synchronization constraint by acquiring lock on `TaskSchedulerImpl` before acquiring lock on `CoarseGrainedSchedulerBackend` in `makeOffers()` as well as killExecutors() method. This ensures resource ordering between the threads and thus, fixes the deadlocks. ## How was this patch tested? Manual Tests Closes apache#24134 from pgandhi999/branch-2.4-SPARK-27112. Authored-by: pgandhi <pgandhi@verizonmedia.com> Signed-off-by: Imran Rashid <irashid@cloudera.com>

[SPARK-27112] : Spark Scheduler encounters two independent Deadlocks …

e649900

…when trying to kill executors either due to dynamic allocation or blacklisting Ordered synchronization constraint by acquiring lock on Task Scheduler before acquiring lock on CoarseGrainedSchedulerBackend

pgandhi999 mentioned this pull request Mar 12, 2019

[SPARK-27112] : Spark Scheduler encounters two independent Deadlocks … #24035

Closed

attilapiros reviewed Mar 12, 2019

View reviewed changes

squito reviewed Mar 12, 2019

View reviewed changes

pgandhi999 changed the title ~~[SPARK-27112] : Spark Scheduler encounters two independent Deadlocks …~~ [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … Mar 13, 2019

dhruve reviewed Mar 13, 2019

View reviewed changes

[SPARK-27112] : Addressing Reviews March 13, 2019

ed12daf

Locking the code block in killExecutors() method with TaskSchedulerImpl followed by CoarseGrainedSchedulerBackend to avoid race condition issue and adding comments.

vanzin reviewed Mar 13, 2019

View reviewed changes

[SPARK-27112] : Adding a helper function

2b4f226

vanzin reviewed Mar 14, 2019

View reviewed changes

pgandhi added 3 commits March 14, 2019 17:57

[SPARK-27112] : Writing comment in single place and removing parenthesis

47448b7

[SPARK-27112] : Formatting Code

fa45128

[SPARK-27112] : Indenting code

09f9b47

vanzin changed the title ~~[SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …~~ [SPARK-27112][core] Create a resource ordering between threads to resolve the deadlocks encountered … Mar 15, 2019

abellina approved these changes Mar 15, 2019

View reviewed changes

pgandhi999 changed the title ~~[SPARK-27112][core] Create a resource ordering between threads to resolve the deadlocks encountered …~~ [SPARK-27112][CORE] : Create a resource ordering between threads to resolve the deadlocks encountered … Mar 15, 2019

asfgit closed this in 7043aee Mar 18, 2019

pgandhi999 mentioned this pull request Mar 18, 2019

[SPARK-27112][CORE] : Create a resource ordering between threads to r… #24134

Closed

[SPARK-27112][CORE] : Create a resource ordering between threads to resolve the deadlocks encountered … #24072

[SPARK-27112][CORE] : Create a resource ordering between threads to resolve the deadlocks encountered … #24072

Uh oh!

Conversation

pgandhi999 commented Mar 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

pgandhi999 commented Mar 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abellina Mar 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abellina commented Mar 12, 2019

Uh oh!

vanzin commented Mar 12, 2019

Uh oh!

SparkQA commented Mar 13, 2019

Uh oh!

squito commented Mar 13, 2019

Uh oh!

pgandhi999 commented Mar 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 13, 2019

Uh oh!

vanzin commented Mar 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin commented Mar 14, 2019

Uh oh!

pgandhi999 commented Mar 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 15, 2019

Uh oh!

SparkQA commented Mar 15, 2019

Uh oh!

SparkQA commented Mar 15, 2019

Uh oh!

attilapiros commented Mar 15, 2019

Uh oh!

pgandhi999 commented Mar 12, 2019 •

edited

Loading

abellina Mar 12, 2019 •

edited

Loading