-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9519][Yarn] Confirm stop sc successfully when application was killed #7846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #39318 has finished for PR 7846 at commit
|
|
This feels too hacky to be a good solution, relying on a flag to pass around who should interrupt a thread. Why not close the |
|
Is the sequence that This is cleaner if it's entirely local to the monitor thread. The backend doesn't need a new field for this. The thread can have a "stop" method that interrupts it only if it's blocked in |
|
Yes, this change doesn't stop this sequence from happening. As monitor thread is daemon thread, we don't need call interrupt after sc.stop().
|
|
If you're asking what I mean, I mean that the monitor thread itself can have the flag, like |
|
Test build #39354 has finished for PR 7846 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that's tidier. Now that it's its own named class, name and daemon status can be set by the class itself I think.
|
Test build #1278 has finished for PR 7846 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this allowInterrupt.
So it too me a bit to understand why this code is like this. Basically when you interrupt it's because the SparkContext is being shut down (sc.stop() called by user code), and you do not want sc.stop() to be called again here. Now if monitorApplication() returns, it means the YARN app finished before sc.stop() was called, which means this code should call sc.stop(). Could you write a small comment explaining that so that in the future people know what's going on here?
|
Looks good, I just think we need a comment explaining the code for future readers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "for SPARK-9519".
|
LGTM, I'll leave it here to see if anyone else has comments, otherwise I'll merge in the morning. |
|
Test build #39798 has finished for PR 7846 at commit
|
|
Test build #39809 has finished for PR 7846 at commit
|
… killed Currently, when we kill application on Yarn, then will call sc.stop() at Yarn application state monitor thread, then in YarnClientSchedulerBackend.stop() will call interrupt this will cause SparkContext not stop fully as we will wait executor to exit. Author: linweizhong <linweizhong@huawei.com> Closes #7846 from Sephiroth-Lin/SPARK-9519 and squashes the following commits: 1ae736d [linweizhong] Update comments 2e8e365 [linweizhong] Add comment explaining the code ad0e23b [linweizhong] Update 243d2c7 [linweizhong] Confirm stop sc successfully when application was killed (cherry picked from commit 7a969a6) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
|
Merged to master and 1.5, thanks! |
Currently, when we kill application on Yarn, then will call sc.stop() at Yarn application state monitor thread, then in YarnClientSchedulerBackend.stop() will call interrupt this will cause SparkContext not stop fully as we will wait executor to exit.