Skip to content
24 changes: 12 additions & 12 deletions core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ package org.apache.spark.ui.exec

import scala.collection.mutable.{LinkedHashMap, ListBuffer}

import org.apache.spark.{ExceptionFailure, Resubmitted, SparkConf, SparkContext}
import org.apache.spark.{Resubmitted, SparkConf, SparkContext}
import org.apache.spark.annotation.DeveloperApi
import org.apache.spark.scheduler._
import org.apache.spark.storage.{StorageStatus, StorageStatusListener}
Expand Down Expand Up @@ -131,17 +131,17 @@ class ExecutorsListener(storageStatusListener: StorageStatusListener, conf: Spar
if (info != null) {
val eid = info.executorId
val taskSummary = executorToTaskSummary.getOrElseUpdate(eid, ExecutorTaskSummary(eid))
taskEnd.reason match {
case Resubmitted =>
// Note: For resubmitted tasks, we continue to use the metrics that belong to the
// first attempt of this task. This may not be 100% accurate because the first attempt
// could have failed half-way through. The correct fix would be to keep track of the
// metrics added by each attempt, but this is much more complicated.
return
case _: ExceptionFailure =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix the problem by changing this line to case _: TaskFailedReason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I added info.successful check at the suggestion of @zsxwing and @jiangxb1987 . I feel this covers all the failed and killed cases which TaskFailedReason might have overlooked. However, if you feel otherwise, please let me know.

taskSummary.tasksFailed += 1
case _ =>
taskSummary.tasksComplete += 1
// Note: For resubmitted tasks, we continue to use the metrics that belong to the
// first attempt of this task. This may not be 100% accurate because the first attempt
// could have failed half-way through. The correct fix would be to keep track of the
// metrics added by each attempt, but this is much more complicated.
if (taskEnd.reason == Resubmitted) {
return
}
if (info.successful) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the relationship between info.successful and taskEnd.reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def successful: Boolean = finished && !failed && !killed

So if the task state shows finished and it has neither failed nor been killed, then info.successful is true. Now, there could be multiple reasons for a failed or killed task, and taskEnd.reason lists those. However, if task state is SUCCESS, so is the reason and thus, info.successful will be true in case of SUCCESS and false for all the other cases.

taskSummary.tasksComplete += 1
} else {
taskSummary.tasksFailed += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this keeps the previous behavior, cc @zsxwing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the list of possible task failure reasons, and I found info.successful to be updated for all except Resubmitted case. If you think there is a possibility of any other reason that has been overlooked by me, do let me know.

}
if (taskSummary.tasksActive >= 1) {
taskSummary.tasksActive -= 1
Expand Down