Skip to content

Commit

Permalink
[SPARK-10739] [YARN] Add application attempt window for Spark on Yarn
Browse files Browse the repository at this point in the history
Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#8857 from jerryshao/SPARK-10739 and squashes the following commits:

36eabdc [jerryshao] change the doc
7f9b77d [jerryshao] Style change
1c9afd0 [jerryshao] Address the comments
caca695 [jerryshao] Add application attempt window for Spark on Yarn
  • Loading branch information
jerryshao authored and Marcelo Vanzin committed Oct 13, 2015
1 parent 091c2c3 commit f97e932
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
9 changes: 9 additions & 0 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,15 @@ If you need a reference to the proper location to put log files in the YARN so t
It should be no larger than the global number of max attempts in the YARN configuration.
</td>
</tr>
<tr>
<td><code>spark.yarn.am.attemptFailuresValidityInterval</code></td>
<td>(none)</td>
<td>
Defines the validity interval for AM failure tracking.
If the AM has been running for at least the defined interval, the AM failure count will be reset.
This feature is not enabled if not configured, and only supported in Hadoop 2.6+.
</td>
</tr>
<tr>
<td><code>spark.yarn.submit.waitAppCompletion</code></td>
<td><code>true</code></td>
Expand Down
14 changes: 14 additions & 0 deletions yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,20 @@ private[spark] class Client(
case None => logDebug("spark.yarn.maxAppAttempts is not set. " +
"Cluster's default value will be used.")
}

if (sparkConf.contains("spark.yarn.am.attemptFailuresValidityInterval")) {
try {
val interval = sparkConf.getTimeAsMs("spark.yarn.am.attemptFailuresValidityInterval")
val method = appContext.getClass().getMethod(
"setAttemptFailuresValidityInterval", classOf[Long])
method.invoke(appContext, interval: java.lang.Long)
} catch {
case e: NoSuchMethodException =>
logWarning("Ignoring spark.yarn.am.attemptFailuresValidityInterval because the version " +
"of YARN does not support it")
}
}

val capability = Records.newRecord(classOf[Resource])
capability.setMemory(args.amMemory + amMemoryOverhead)
capability.setVirtualCores(args.amCores)
Expand Down

0 comments on commit f97e932

Please sign in to comment.