-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10739][Yarn] Add application attempt window for Spark on Yarn #8857
Conversation
Test build #42792 has finished for PR 8857 at commit
|
Jenkins, retest this please. |
Test build #42841 has finished for PR 8857 at commit
|
Jenkins, retest this please. |
1 similar comment
Jenkins, retest this please. |
Test build #42860 has finished for PR 8857 at commit
|
Jenkins, retest this please. |
Test build #42911 has finished for PR 8857 at commit
|
@@ -304,6 +304,14 @@ If you need a reference to the proper location to put log files in the YARN so t | |||
</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.yarn.attemptFailuresValidityInterval</code></td> | |||
<td>-1</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is actually (none)
according to the code.
Thanks @vanzin for your comments, I will update the codes accordingly. |
Test build #43509 has finished for PR 8857 at commit
|
Test build #43511 has finished for PR 8857 at commit
|
<td><code>spark.yarn.am.attemptFailuresValidityInterval</code></td> | ||
<td>(none)</td> | ||
<td> | ||
Defines the validity interval (in millisecond) for AM failure tracking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't say "in milliseconds". The user can specify the unit in the value (e.g. "1d" or "4500ms").
Code looks ok, docs still need some tweaking. |
Test build #43523 has finished for PR 8857 at commit
|
Ping @vanzin , mind reviewing again, thanks a lot. |
<td>(none)</td> | ||
<td> | ||
Defines the validity interval for AM failure tracking. | ||
If the AM has been running for at least defined interval, the AM failure count will be reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"at least the defined..."
LGTM, I'll fix the remaining issue on merge. |
Thanks @vanzin for your review. |
Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures.