Skip to content

Conversation

@Sephiroth-Lin
Copy link
Contributor

  1. YarnClientSchedulerBack.asyncMonitorApplication use Client.monitorApplication so that commonize the monitor logic
  2. Support changing the yarn client monitor interval, see [SPARK-3596][YARN]Support changing the yarn client monitor interval #5292
  3. More details see discussion on [SPARK-4282][YARN] Stopping flag in YarnClientSchedulerBackend should be volatile #3143

@srowen
Copy link
Member

srowen commented Apr 1, 2015

Jenkins, add to whitelist

@srowen
Copy link
Member

srowen commented Apr 1, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29536 has finished for PR 5305 at commit 6b47ff7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@Sephiroth-Lin
Copy link
Contributor Author

Jenkins, retest please

@Sephiroth-Lin
Copy link
Contributor Author

@srowen unit tests failed at run Python app on yarn-cluster mode, I think this didn't cause by this PR, please ask jenkins to retest, thank you.

@srowen
Copy link
Member

srowen commented Apr 2, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29604 has finished for PR 5305 at commit 6b47ff7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@srowen
Copy link
Member

srowen commented Apr 2, 2015

CC a few people who have touched this bit of the code: @kasjain @witgo @andrewor14

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process doesn't look the same. ApplicationNotFoundException is not handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can just add it in Client#monitorApplication as well

@andrewor14
Copy link
Contributor

Hi @Sephiroth-Lin this looks good for the most part. The only reason why I didn't merge these code paths initially is because the loop here checks on stopping, and I didn't want to pass the boolean into monitorApplication. However, on second examination this is a daemon thread anyway, so it will eventually exit regardless of whether the backend has been stopped.

Once you address @witgo's comment I will merge this thanks.

unknown and others added 2 commits April 3, 2015 11:15
@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29643 has finished for PR 5305 at commit 6483a2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29644 has finished for PR 5305 at commit ee2b2fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably don't need to wrap this entire block in the try. You could for instance do the following instead:

val report: ApplicationReport =
  try {
    getApplicationReport(appId)
  } catch {
    case e: ApplicationNotFoundException =>
      return (..., ...)
  }
val state = report.getYarnApplicationState

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you!!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop() can be called from other areas of the code (like SparkContext.stop()). Now that the loop isn't checking for it we wouldn't interrupt this thread if that happens and I think we need to handle that case. See the discussions on #3143

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. We might need to put this in a private var monitorThread or something and interrupt it in stop(). (@Sephiroth-Lin I would make this method return the thread and set monitorThread = asyncMonitorApplication() in start())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We need to interrupt the thread in stop().

unknown added 2 commits April 8, 2015 10:55
Conflicts:
	yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29827 has finished for PR 5305 at commit 47c0014.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

Nice, LGTM merging into master thanks @Sephiroth-Lin @tgravescs @witgo

@asfgit asfgit closed this in 55a92ef Apr 8, 2015
asfgit pushed a commit that referenced this pull request Apr 13, 2015
…tate monitor thread been interrupted

On PR #5305 we interrupt the monitor thread but forget to catch the InterruptedException, then in the log will print the stack info, so we need to catch it.

Author: linweizhong <linweizhong@huawei.com>

Closes #5479 from Sephiroth-Lin/SPARK-6870 and squashes the following commits:

f775f93 [linweizhong] Update, don't need to call Thread.currentThread() on monitor thread
0e2ef1f [linweizhong] Update
0d8958a [linweizhong] Update
3513fdb [linweizhong] Catch InterruptedException
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants