-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10640] History server fails to parse TaskCommitDenied #8828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
thanks, hadn't had a chance to finish this. I'll try out your changes on my bad history file to make sure it works. |
|
Great, thanks Tom. |
|
well your change fixes that error and it now displays the file on the screen. But it does throw an exception. I think its due to the missing fields in the history file for the job id. org.json4s.package$MappingException: Did not find value which can be converted into int |
|
+1 pending jenkins. |
|
I see, because in your event log we only logged the name |
|
Test build #42690 has finished for PR 8828 at commit
|
|
Test build #42703 has started for PR 8828 at commit |
|
Yeah and actually I want to do one more verification to make sure the rest of the history file is useful. Unfortunately my network connection is slow that its a huge history file. |
|
ok it finally loaded. So the history UI for that task reports its still RUNNING since it got the error parsing it. I guess that is ok. Ideally it would still show end and the task commit error even if it couldn't report the jod id, etc. Is that something we could do fairly easily? |
|
Hm, unfortunately it appears that we missed two whole minor versions (1.3.0 and 1.4.0). I wonder if we should add some backward compatible handling for those versions. AFAIK they're not really consumed for any other purpose downstream so we can just put |
|
Also since those values are missing it also causes duration and completed time to not show up. Makes it difficult for users to debug there job. This particular job I was looking ran for 7 hours so I can't just rerun to get the data again. |
For logs that did not have the TaskCommitDenied fields, we should fail gracefully especially since they're not even consumed downstream by the UI. Otherwise we'll see exceptions in the history server when parsing old logs (1.3.x, 1.4.x, 1.5.0).
|
The approach looks ok to me if it works (seems this only affects jobs with speculative execution on?). |
|
Yes I believe so. @tgravescs can you give it another try? |
|
yes it only happens with speculation. I'll try it out. |
|
Test build #42760 has finished for PR 8828 at commit
|
|
That is much better, it reports completed times and duration for the entire application those tasks show up as failed with comment: TaskCommitDenied (Driver denied task commit) for job: -1, partition: -1, attempt: -1 +1, Thanks Andrew! |
|
Alright, merging into master 1.5. Thanks everyone. |
... simply because the code is missing! Author: Andrew Or <andrew@databricks.com> Closes #8828 from andrewor14/task-end-reason-json. Conflicts: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
... simply because the code is missing! Author: Andrew Or <andrew@databricks.com> Closes #8828 from andrewor14/task-end-reason-json. Conflicts: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
... simply because the code is missing! Author: Andrew Or <andrew@databricks.com> Closes apache#8828 from andrewor14/task-end-reason-json. Conflicts: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala (cherry picked from commit 5ffd084)
... simply because the code is missing! Author: Andrew Or <andrew@databricks.com> Closes apache#8828 from andrewor14/task-end-reason-json. Conflicts: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala (cherry picked from commit 26187ab)
... simply because the code is missing!