Skip to content

Commit f97e932

Browse files
jerryshaoMarcelo Vanzin
authored andcommitted
[SPARK-10739] [YARN] Add application attempt window for Spark on Yarn
Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures. Author: jerryshao <sshao@hortonworks.com> Closes apache#8857 from jerryshao/SPARK-10739 and squashes the following commits: 36eabdc [jerryshao] change the doc 7f9b77d [jerryshao] Style change 1c9afd0 [jerryshao] Address the comments caca695 [jerryshao] Add application attempt window for Spark on Yarn
1 parent 091c2c3 commit f97e932

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed

docs/running-on-yarn.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,15 @@ If you need a reference to the proper location to put log files in the YARN so t
305305
It should be no larger than the global number of max attempts in the YARN configuration.
306306
</td>
307307
</tr>
308+
<tr>
309+
<td><code>spark.yarn.am.attemptFailuresValidityInterval</code></td>
310+
<td>(none)</td>
311+
<td>
312+
Defines the validity interval for AM failure tracking.
313+
If the AM has been running for at least the defined interval, the AM failure count will be reset.
314+
This feature is not enabled if not configured, and only supported in Hadoop 2.6+.
315+
</td>
316+
</tr>
308317
<tr>
309318
<td><code>spark.yarn.submit.waitAppCompletion</code></td>
310319
<td><code>true</code></td>

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,20 @@ private[spark] class Client(
208208
case None => logDebug("spark.yarn.maxAppAttempts is not set. " +
209209
"Cluster's default value will be used.")
210210
}
211+
212+
if (sparkConf.contains("spark.yarn.am.attemptFailuresValidityInterval")) {
213+
try {
214+
val interval = sparkConf.getTimeAsMs("spark.yarn.am.attemptFailuresValidityInterval")
215+
val method = appContext.getClass().getMethod(
216+
"setAttemptFailuresValidityInterval", classOf[Long])
217+
method.invoke(appContext, interval: java.lang.Long)
218+
} catch {
219+
case e: NoSuchMethodException =>
220+
logWarning("Ignoring spark.yarn.am.attemptFailuresValidityInterval because the version " +
221+
"of YARN does not support it")
222+
}
223+
}
224+
211225
val capability = Records.newRecord(classOf[Resource])
212226
capability.setMemory(args.amMemory + amMemoryOverhead)
213227
capability.setVirtualCores(args.amCores)

0 commit comments

Comments
 (0)