-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21503][UI]: Spark UI shows incorrect task status for a killed Executor Process #18707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
172fc20
81422e0
55c6c37
f454c89
6b7d5c6
bc41664
e46126f
9b3cebc
7f03341
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,7 +19,7 @@ package org.apache.spark.ui.exec | |
|
|
||
| import scala.collection.mutable.{LinkedHashMap, ListBuffer} | ||
|
|
||
| import org.apache.spark.{ExceptionFailure, Resubmitted, SparkConf, SparkContext} | ||
| import org.apache.spark.{Resubmitted, SparkConf, SparkContext} | ||
| import org.apache.spark.annotation.DeveloperApi | ||
| import org.apache.spark.scheduler._ | ||
| import org.apache.spark.storage.{StorageStatus, StorageStatusListener} | ||
|
|
@@ -131,17 +131,17 @@ class ExecutorsListener(storageStatusListener: StorageStatusListener, conf: Spar | |
| if (info != null) { | ||
| val eid = info.executorId | ||
| val taskSummary = executorToTaskSummary.getOrElseUpdate(eid, ExecutorTaskSummary(eid)) | ||
| taskEnd.reason match { | ||
| case Resubmitted => | ||
| // Note: For resubmitted tasks, we continue to use the metrics that belong to the | ||
| // first attempt of this task. This may not be 100% accurate because the first attempt | ||
| // could have failed half-way through. The correct fix would be to keep track of the | ||
| // metrics added by each attempt, but this is much more complicated. | ||
| return | ||
| case _: ExceptionFailure => | ||
| taskSummary.tasksFailed += 1 | ||
| case _ => | ||
| taskSummary.tasksComplete += 1 | ||
| // Note: For resubmitted tasks, we continue to use the metrics that belong to the | ||
| // first attempt of this task. This may not be 100% accurate because the first attempt | ||
| // could have failed half-way through. The correct fix would be to keep track of the | ||
| // metrics added by each attempt, but this is much more complicated. | ||
| if (taskEnd.reason == Resubmitted) { | ||
| return | ||
| } | ||
| if (info.successful) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the relationship between
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. def successful: Boolean = finished && !failed && !killed So if the task state shows finished and it has neither failed nor been killed, then info.successful is true. Now, there could be multiple reasons for a failed or killed task, and taskEnd.reason lists those. However, if task state is SUCCESS, so is the reason and thus, info.successful will be true in case of SUCCESS and false for all the other cases. |
||
| taskSummary.tasksComplete += 1 | ||
| } else { | ||
| taskSummary.tasksFailed += 1 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this keeps the previous behavior, cc @zsxwing
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went through the list of possible task failure reasons, and I found info.successful to be updated for all except Resubmitted case. If you think there is a possibility of any other reason that has been overlooked by me, do let me know. |
||
| } | ||
| if (taskSummary.tasksActive >= 1) { | ||
| taskSummary.tasksActive -= 1 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix the problem by changing this line to
case _: TaskFailedReason?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I added info.successful check at the suggestion of @zsxwing and @jiangxb1987 . I feel this covers all the failed and killed cases which TaskFailedReason might have overlooked. However, if you feel otherwise, please let me know.