[SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks #6817

XuTingjun · 2015-06-15T02:25:09Z

No description provided.

XuTingjun · 2015-06-15T08:25:41Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

when the task fails, a new one will append, so I post a resubmit event here to let the ExecutorAllocationManager know.

if the numFailures is bigger than maxTaskFailures, not need to append new a task and submit it

XuTingjun · 2015-06-15T09:01:15Z

@sryza @andrewor14 Can you have a look?

squito · 2015-06-15T16:23:02Z

err, sorry @XuTingjun I was looking at an old version of the page, I deleted my comment about the jira

andrewor14 · 2015-06-17T20:17:18Z

Hi @XuTingjun I'll look at this today if not tomorrow thanks for the ping.

andrewor14 · 2015-06-18T20:55:05Z

ok to test

SparkQA · 2015-06-18T21:00:23Z

Test build #35170 has finished for PR 6817 at commit 51e158c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SparkListenerTaskResubmit(stageId: Int) extends SparkListenerEvent

SparkQA · 2015-06-19T02:33:16Z

Test build #35207 has finished for PR 6817 at commit c6609a2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SparkListenerTaskResubmit(stageId: Int) extends SparkListenerEvent

XuTingjun · 2015-06-19T07:02:14Z

I think this patch has no association with the failed unit tests, please retest.

XuTingjun · 2015-06-23T01:54:37Z

Jenkins, retest this please.

SparkQA · 2015-06-23T04:28:28Z

Test build #35506 has finished for PR 6817 at commit c6609a2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SparkListenerTaskResubmit(stageId: Int) extends SparkListenerEvent

XuTingjun · 2015-06-24T05:50:29Z

@andrewor14

andrewor14 · 2015-06-30T02:42:09Z

@XuTingjun From the description, I'm still having a hard time trying to understand what the symptom is. From the JIRA:

I use the dynamic executor allocation function. Then one executor is killed, all the tasks on it are failed. When the new tasks are appended, the new executor won't added.

What do you mean won't add? Are the resubmitted tasks not being run on the new executor? Or are we not requesting new executors? Could you give a detailed example of a case when this happens?

andrewor14 · 2015-06-30T02:50:09Z

Also, echoing @squito's comments, I also don't really see why a new TaskResubmit event is necessary. If a task is resubmitted, a new SparkListenerTaskStart event will be posted with a new stage attempt ID, so we already get notified of that when it happens. Right?

XuTingjun · 2015-06-30T06:10:29Z

@andrewor14, sorry for my pool English. The problem is :
when a executor losts, the running tasks on it will be failed, and post a SparkListenerTaskEnd. Until reach maxTaskFailures, the failed tasks will re-run with a new task id. Yeah, the new task will post a SparkListenerTaskStart, and the stageIdToTaskIndices will add. But the total task num only set when StageSubmitted. so the numTasksScheduled == numTasksTotal won't be accessed, and pending tasks calculation will be wrong.

XuTingjun · 2015-07-03T01:54:33Z

@andrewor14, Have you understood the problem?

  def totalPendingTasks(): Int = {
      stageIdToNumTasks.map { case (stageId, numTasks) =>
        numTasks - stageIdToTaskIndices.get(stageId).map(_.size).getOrElse(0)
      }.sum
    }

The new attempt tasks will count into stageIdToTaskIndices, but not stageIdToNumTasks. So the new attempt tasks will not count into PendingTasks and maxNumExecutorsNeeded.

XuTingjun · 2015-07-16T01:49:37Z

@andrewor14 , Sorry to bother you again. I think it's really a bug, wish you have a look again, thanks!

andrewor14 · 2015-07-17T02:57:26Z

Hi @XuTingjun sorry for slipping. I'll have another look at this tomorrow.

andrewor14 · 2015-07-17T22:21:03Z

@XuTingjun I dug into the scheduler code a little. When a task is resubmitted, it uses a new task ID, but not a new task index. To calculate the number of pending tasks, we use the task index, not the task ID. Therefore, it should handle resubmit correctly since task indices are the same across multiple attempts of the same task.

Could you clarify what the resulting behavior of this bug is? It will be useful to describe the symptoms without referring to the low-level implementation. I just want to know what the consequences the issue has for the Spark user who knows nothing about ExecutorAllocationManager.

andrewor14 · 2015-07-17T22:24:28Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

If anything, I would think that we should remove this line. If this task fails, then the next attempt would go to the else case of stageIdNumTasks.getOrElse(stageId, -1), which not technically correct. It's safe to remove it because we remove it in stageCompleted anyway.

Can we delete line 557-585 ? I think stageCompleted also have done this.

I'm not following... L557 - 585 refer to adding executors. stageCompleted does not even deal with executors.

XuTingjun · 2015-07-20T01:31:56Z

Let me answer the question - “Could you clarify what the resulting behavior of this bug is? ”.
-- An application only has one executor, if the executor is lost and the tasks fails. There will no new executor allocated to the new attempt tasks, so the application will hung.

@andrewor14 I don't agree with your opinion. Let me give an example, a stage only has one task with task index is 1, so one executor is allocate to this task. But when the task fails of executor lost , a new attempt task will start, no executor will allocate, because numTasks =stageIdToTaskIndices.get(stageId), the pending task is 0.

def totalPendingTasks(): Int = {
      stageIdToNumTasks.map { case (stageId, numTasks) =>
        numTasks - stageIdToTaskIndices.get(stageId).map(_.size).getOrElse(0)
      }.sum
    }

andrewor14 · 2015-07-21T17:31:37Z

Yes, so in that case we should just remove the task from stageIdToTaskIndices if the task did not end successfully right? The new resubmitted task still has the same task index so we can just treat it as a normal task. I believe the fix here can be much simpler.

Do you understand my proposal?

XuTingjun · 2015-07-22T07:20:22Z

Yeah, I got it. I think we can add below code into onTaskEnd method, right?

stageIdToTaskIndices.get(taskEnd.stageId).get.remove(taskIndex)

SparkQA · 2015-08-05T19:34:10Z

Test build #233 has finished for PR 6817 at commit 4b2dd75.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-05T19:38:06Z

Test build #39862 has finished for PR 6817 at commit 4b2dd75.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-08-06T17:33:36Z

retest this please, this is failing a test that was already fixed in upstream

SparkQA · 2015-08-06T20:29:57Z

Test build #40049 has finished for PR 6817 at commit 4b2dd75.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-07T10:04:09Z

Test build #40160 has finished for PR 6817 at commit 25734c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

markhamstra · 2015-08-07T17:17:06Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

update or remove this comment

andrewor14 · 2015-08-08T06:30:34Z

@XuTingjun The fix here is not correct because after all tasks have been scheduled we keep asking for executors even though we don't need them. I would just add back the entire code block you deleted in L602-610 and call onSchedulerBacklogged on task end if the task is not successful. This ensures

(1) We stop requesting executors once we have started all tasks (not finished)
(2) We request executors again if a task fails.

Does that make sense?

markhamstra · 2015-08-08T06:46:21Z

@andrewor14 Makes better sense to me. Thanks for the explanation, Andrew.

XuTingjun · 2015-08-10T03:18:42Z

@andrewor14, I understand what you mean.
what I consider is that, if many stages run in parallel, just delete L606 may be not correct.

XuTingjun · 2015-08-10T07:31:40Z

Maybe we can change below code, right?

        val numTasksScheduled = stageIdToTaskIndices(stageId).size
        val numTasksTotal = stageIdToNumTasks.getOrElse(stageId, -1)
        if (numTasksScheduled == numTasksTotal) {
          // No more pending tasks for this stage
          stageIdToNumTasks -= stageId
          if (stageIdToNumTasks.isEmpty) {
            allocationManager.onSchedulerQueueEmpty()
          }
        }

to

        if (totalPendingTasks() == 0) {
          allocationManager.onSchedulerQueueEmpty()
        }

andrewor14 · 2015-08-10T17:44:38Z

@XuTingjun yes I think that looks fine. Would you mind testing this change on a real cluster? This scenario is somewhat tricky and it would be good to verify that it works outside of unit tests.

XuTingjun · 2015-08-11T07:54:50Z

@andrewor14, I have tested it in the real cluster, it's ok.

andrewor14 · 2015-08-11T17:26:47Z

retest this please

andrewor14 · 2015-08-11T17:29:03Z

Latest changes LGTM, will merge once tests pass. Thanks for your persistence @XuTingjun

SparkQA · 2015-08-11T20:26:00Z

Test build #40476 timed out for PR 6817 at commit 9f12fa6 after a configured wait of 175m.

andrewor14 · 2015-08-11T20:41:14Z

retest this please

andrewor14 · 2015-08-11T21:59:18Z

retest this please

SparkQA · 2015-08-11T23:40:10Z

Test build #40509 timed out for PR 6817 at commit 9f12fa6 after a configured wait of 175m.

SparkQA · 2015-08-12T00:23:17Z

Test build #1461 has finished for PR 6817 at commit 9f12fa6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-12T00:41:47Z

Test build #40526 has finished for PR 6817 at commit 9f12fa6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-08-12T06:18:05Z

Alright, merging into master 1.5.

Author: xutingjun <xutingjun@huawei.com> Author: meiyoula <1039320815@qq.com> Closes #6817 from XuTingjun/SPARK-8366. (cherry picked from commit b85f9a2) Signed-off-by: Andrew Or <andrew@databricks.com>

Author: xutingjun <xutingjun@huawei.com> Author: meiyoula <1039320815@qq.com> Closes apache#6817 from XuTingjun/SPARK-8366.

fix dynamic bug when task fails

ee673b5

XuTingjun changed the title ~~When tasks failed and append new ones, post SparkListenerTaskResubmit event~~ When tasks failed and append new ones, post SparkListenerTaskResubmit event to ExecutorAllocationManager Jun 15, 2015

XuTingjun changed the title ~~When tasks failed and append new ones, post SparkListenerTaskResubmit event to ExecutorAllocationManager~~ [SPARK-8366] When tasks failed and append new ones, post SparkListenerTaskResubmit event to ExecutorAllocationManager Jun 15, 2015

XuTingjun added 2 commits June 15, 2015 10:29

fix code style

d73c9ac

fix bug

51e158c

XuTingjun reviewed Jun 15, 2015
View reviewed changes

fix scala style

c6609a2

andrewor14 reviewed Jul 17, 2015
View reviewed changes

XuTingjun changed the title ~~[SPARK-8366] When tasks failed and append new ones, post SparkListenerTaskResubmit event to ExecutorAllocationManager~~ [SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks Aug 6, 2015

rename test case

25734c1

markhamstra reviewed Aug 7, 2015
View reviewed changes

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

Copy link

Contributor

markhamstra Aug 7, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update or remove this comment

improve resolution

9f12fa6

asfgit closed this in b85f9a2 Aug 12, 2015

[SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks #6817

[SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks #6817

Uh oh!

Conversation

XuTingjun commented Jun 15, 2015

Uh oh!

XuTingjun Jun 15, 2015

Choose a reason for hiding this comment

Uh oh!

XuTingjun commented Jun 15, 2015

Uh oh!

squito commented Jun 15, 2015

Uh oh!

andrewor14 commented Jun 17, 2015

Uh oh!

andrewor14 commented Jun 18, 2015

Uh oh!

SparkQA commented Jun 18, 2015

Uh oh!

SparkQA commented Jun 19, 2015

Uh oh!

XuTingjun commented Jun 19, 2015

Uh oh!

XuTingjun commented Jun 23, 2015

Uh oh!

SparkQA commented Jun 23, 2015

Uh oh!

XuTingjun commented Jun 24, 2015

Uh oh!

andrewor14 commented Jun 30, 2015

Uh oh!

andrewor14 commented Jun 30, 2015

Uh oh!

XuTingjun commented Jun 30, 2015

Uh oh!

XuTingjun commented Jul 3, 2015

Uh oh!

XuTingjun commented Jul 16, 2015

Uh oh!

andrewor14 commented Jul 17, 2015

Uh oh!

andrewor14 commented Jul 17, 2015

Uh oh!

andrewor14 Jul 17, 2015

Choose a reason for hiding this comment

Uh oh!

XuTingjun Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 Jul 22, 2015

Choose a reason for hiding this comment

Uh oh!

XuTingjun commented Jul 20, 2015

Uh oh!

andrewor14 commented Jul 21, 2015

Uh oh!

XuTingjun commented Jul 22, 2015

Uh oh!

SparkQA commented Aug 5, 2015

Uh oh!

SparkQA commented Aug 5, 2015

Uh oh!

andrewor14 commented Aug 6, 2015

Uh oh!

SparkQA commented Aug 6, 2015

Uh oh!

SparkQA commented Aug 7, 2015

Uh oh!

markhamstra Aug 7, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Aug 8, 2015

Uh oh!

markhamstra commented Aug 8, 2015

Uh oh!

XuTingjun commented Aug 10, 2015

Uh oh!

XuTingjun commented Aug 10, 2015

Uh oh!

andrewor14 commented Aug 10, 2015