Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10739][Yarn] Add application attempt window for Spark on Yarn #8857

Closed
wants to merge 4 commits into from

Conversation

jerryshao
Copy link
Contributor

Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures.

@SparkQA
Copy link

SparkQA commented Sep 22, 2015

Test build #42792 has finished for PR 8857 at commit caca695.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 22, 2015

Test build #42841 has finished for PR 8857 at commit caca695.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

1 similar comment
@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 22, 2015

Test build #42860 has finished for PR 8857 at commit caca695.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42911 has finished for PR 8857 at commit caca695.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Hi @vanzin , @sryza any opinion on this PR? Thanks a lot.

@@ -304,6 +304,14 @@ If you need a reference to the proper location to put log files in the YARN so t
</td>
</tr>
<tr>
<td><code>spark.yarn.attemptFailuresValidityInterval</code></td>
<td>-1</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is actually (none) according to the code.

@jerryshao
Copy link
Contributor Author

Thanks @vanzin for your comments, I will update the codes accordingly.

@SparkQA
Copy link

SparkQA commented Oct 10, 2015

Test build #43509 has finished for PR 8857 at commit 1c9afd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2015

Test build #43511 has finished for PR 8857 at commit 7f9b77d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<td><code>spark.yarn.am.attemptFailuresValidityInterval</code></td>
<td>(none)</td>
<td>
Defines the validity interval (in millisecond) for AM failure tracking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't say "in milliseconds". The user can specify the unit in the value (e.g. "1d" or "4500ms").

@vanzin
Copy link
Contributor

vanzin commented Oct 10, 2015

Code looks ok, docs still need some tweaking.

@SparkQA
Copy link

SparkQA commented Oct 10, 2015

Test build #43523 has finished for PR 8857 at commit 36eabdc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Ping @vanzin , mind reviewing again, thanks a lot.

<td>(none)</td>
<td>
Defines the validity interval for AM failure tracking.
If the AM has been running for at least defined interval, the AM failure count will be reset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"at least the defined..."

@vanzin
Copy link
Contributor

vanzin commented Oct 13, 2015

LGTM, I'll fix the remaining issue on merge.

@asfgit asfgit closed this in f97e932 Oct 13, 2015
@jerryshao
Copy link
Contributor Author

Thanks @vanzin for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants