-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3591][YARN]fire and forget for YARN cluster mode #5297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29482 has started for PR 5297 at commit |
|
Test build #29482 has finished for PR 5297 at commit
|
|
Test FAILed. |
|
I will look up into the test cases and fix them. |
|
If somebody has an existing script that runs spark-submit and then follows it with an action that expects the app to have completed, this would break it, right? I think we might need to expose this as a config option instead of changing the default behavior. |
|
Is this that different from just backgrounding the process? which can still be monitored, produces logs, etc. |
|
Test build #29519 has started for PR 5297 at commit |
|
Test build #29519 has finished for PR 5297 at commit
|
|
Test FAILed. |
|
Jenkins, test this please. |
|
Test build #29520 has started for PR 5297 at commit |
|
Test build #29520 has finished for PR 5297 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather see this combined with monitorapplication. monitorapplication already supports a returnOnRunning option that I think we can use. There is a waitForApplication in the yarnClientSchedulerBackend.
|
Test build #29547 has started for PR 5297 at commit |
|
@tgravescs Thanks for comments. Now I reuse monitorapplication and the client process will exist after application begin running, which is a little different with fire-and-forget and I think the difference is tiny enough to be give another thought. I'd rather leave the option for spark-submit to be added by the people who is eager to. In case the comments by @sryza (about naming) might be ignored as the code change. I paste them here: sryza:
WangTaoTheTonic:
|
|
That is a good point, using monitorApplication actually waits for running and that isn't necessarily what we want here. For instance it could be blocked on running waiting on resources. So I apologize, what you had before is better. So a few comments on the old version.
|
|
Test build #29547 has finished for PR 5297 at commit
|
|
Test PASSed. |
591b752 to
e1a4013
Compare
|
Test build #29580 has started for PR 5297 at commit |
|
Test build #29580 has finished for PR 5297 at commit
|
|
Test PASSed. |
|
Test build #29611 has finished for PR 5297 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out in a few cases and I noticed if we failed to submit there is no message telling the user why. we need print the report in the case the state failed/killed too. Make sure the diagnostics message is there.
9106da8 to
fea390d
Compare
|
@tgravescs IIUC the output diagnostics message should be:
Note: In order to produce the killed case, I add Thead.sleep(10000) in test code. Hope the interval in log will not cause any mislead. |
|
Test build #29642 has started for PR 5297 at commit |
|
Test build #29642 has finished for PR 5297 at commit
|
|
Test PASSed. |
|
@tgravescs @WangTaoTheTonic I wonder whether we should just use An argument against doing |
|
I prefer the current way (not waiting for running). The YarnClient we are using makes sure it gets submitted before returning. Beyond that we shouldn't really have to wait for it to actually run. If your cluster is busy that could be an indefinite amount of time. In many cases I just want to start an application and I don't really care if it actually starts running immediately or in 10 minutes. The first application report is useful if it fails. For instance I submit to a queue that doesn't exist or have some bad config. Not that this matters to much, but MapReduce also has a submit and no wait option and it doesn't wait for running, just submitted. I only mention it because if people are coming from that they might have expectations on the behavior. |
|
I see, then this LGTM pending @tgravescs' other comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kind of a nit but we could just move line 633-634 up before this if statement and we wouldn't need this second logInfo(formatReportDetails(report))
|
LGTM as well |
|
since my only other comment was a nit we can check this in without that change. I'll leave it til tomorrow and commit then unless @andrewor14 beats me to it. |
|
Test build #29775 has started for PR 5297 at commit |
|
Test build #29777 has started for PR 5297 at commit |
|
Test build #29775 has finished for PR 5297 at commit
|
|
Test PASSed. |
|
Test build #29777 has finished for PR 5297 at commit
|
|
Test PASSed. |
https://issues.apache.org/jira/browse/SPARK-3591
The output after this patch:
/cc @andrewor14