[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running. #8153

rohitagarwal003 · 2015-08-13T00:21:38Z

No description provided.

…are already running.

andrewor14 · 2015-08-13T02:01:45Z

ok to test @vanzin

squito · 2015-08-13T02:37:45Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

you can avoid all mutability with

val tasks: Seq[Future[_]] = logInfos.grouped(20).map{ batch => replayExecutor.submit(new Runnable { override def run(): Unit = mergeApplicationListing(batch) }) }

(I know the change to grouped is unrelated, but everytime I look at this code it confuses me for a second why we have overlapping sliding windows -- might as well as clean it up while you are messing around here)

Thanks! I am a Scala noob - I did it the Java way. :-)
Your suggestion looks much cleaner - I have updated the PR.

squito · 2015-08-13T02:38:17Z

minor style comment, otherwise makes sense to me

SparkQA · 2015-08-13T04:56:54Z

Test build #40716 has finished for PR 8153 at commit bdb5955.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class QRDecomposition[QType, RType](Q: QType, R: RType)

…ding(x, x).

vanzin · 2015-08-13T17:47:30Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

Since we're talking about cleaning this up, you could do this also:

logInfos.grouped(20) .map { batch => replayExecutor.submit(new Runnable { override def run(): Unit = mergeApplicationListing(batch) }) } .foreach { task => // Wait for all tasks to finish. This makes sure that checkForLogs is // not scheduled again while some tasks are already running in the // replayExecutor. try { task.get() } catch { case e: InterruptedException => throw e case e: Exception => logWarning("Error replaying logs.", e) } }

Note I added some missing exception handling, which would cause you to revert to the old behavior of piling up executions if an error happened.

Thanks! I have updated the PR to add exception handling. I have modified the log message because errors while replaying event logs are already caught in the mergeApplicationListing method.

vanzin · 2015-08-13T17:49:03Z

In a way I think the underlying issue is more a problem with the aggressive default polling interval (10 seconds?). But this is a way to make it better.

I think in the (not so distant) future we should investigate using the recently added inotify-like API in HDFS, to see whether it helps us avoid polling altogether.

SparkQA · 2015-08-13T20:36:06Z

Test build #40782 has finished for PR 8153 at commit 249f4ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2015-08-13T21:25:50Z

Jenkins, retest this please

SparkQA · 2015-08-14T00:29:37Z

Test build #40805 timed out for PR 8153 at commit 249f4ef after a configured wait of 175m.

…behavior of piling up executions.

SparkQA · 2015-08-14T11:47:39Z

Test build #40857 has finished for PR 8153 at commit 3e22b6c.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2015-08-14T18:09:24Z

Jenkins, retest this please

vanzin · 2015-08-14T18:13:22Z

LGTM.

SparkQA · 2015-08-14T21:14:15Z

Test build #40900 timed out for PR 8153 at commit cd1ef90 after a configured wait of 175m.

vanzin · 2015-08-14T23:38:14Z

retest this please

SparkQA · 2015-08-15T02:31:33Z

Test build #40933 has finished for PR 8153 at commit cd1ef90.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rohitagarwal003 · 2015-08-15T15:32:33Z

Can we retest this please?
The failures are unrelated.

SparkQA · 2015-08-15T21:56:00Z

Test build #1625 timed out for PR 8153 at commit cd1ef90 after a configured wait of 175m.

vanzin · 2015-08-15T22:33:19Z

retest this please

SparkQA · 2015-08-16T01:40:17Z

Test build #40975 timed out for PR 8153 at commit cd1ef90 after a configured wait of 175m.

vanzin · 2015-08-16T01:49:30Z

As usual, flaky tests in other unrelated modules. I'll just give up on jenkins and merge this Monday morning.

vanzin · 2015-08-17T17:32:17Z

Merged to master, thanks!

…are already running. Author: Rohit Agarwal <rohita@qubole.com> Closes apache#8153 from mindprince/SPARK-9924.

[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them …

bdb5955

…are already running.

squito reviewed Aug 13, 2015
View reviewed changes

[SPARK-9924] [WEB UI] Avoid mutability. Use grouped(x) instead of sli…

249f4ef

…ding(x, x).

vanzin reviewed Aug 13, 2015
View reviewed changes

Rohit Agarwal added 2 commits August 14, 2015 01:40

[SPARK-9924] [WEB UI] Catch exceptions to avoid reverting to the old …

3e22b6c

…behavior of piling up executions.

[SPARK-9924] [WEB UI] Remove unneeded import.

cd1ef90

asfgit closed this in ed092a0 Aug 17, 2015

tgravescs pushed a commit to tgravescs/spark that referenced this pull request Sep 10, 2015

[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them …

16e1c5f

…are already running. Author: Rohit Agarwal <rohita@qubole.com> Closes apache#8153 from mindprince/SPARK-9924.

[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running. #8153

[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running. #8153

Uh oh!

Conversation

rohitagarwal003 commented Aug 13, 2015

Uh oh!

andrewor14 commented Aug 13, 2015

Uh oh!

squito Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

rohitagarwal003 Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

squito commented Aug 13, 2015

Uh oh!

SparkQA commented Aug 13, 2015

Uh oh!

vanzin Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

rohitagarwal003 Aug 14, 2015

Choose a reason for hiding this comment

Uh oh!

vanzin commented Aug 13, 2015

Uh oh!

SparkQA commented Aug 13, 2015

Uh oh!

squito commented Aug 13, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

squito commented Aug 14, 2015

Uh oh!

vanzin commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 14, 2015

Uh oh!

vanzin commented Aug 14, 2015

Uh oh!

SparkQA commented Aug 15, 2015

Uh oh!

rohitagarwal003 commented Aug 15, 2015

Uh oh!

SparkQA commented Aug 15, 2015

Uh oh!

vanzin commented Aug 15, 2015

Uh oh!

SparkQA commented Aug 16, 2015

Uh oh!

vanzin commented Aug 16, 2015

Uh oh!

vanzin commented Aug 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants