-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15359] [Mesos] Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() #13143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tatus from mesosDriver.run()
|
ok to test |
|
Test build #59856 has finished for PR 13143 at commit
|
| error = Some(new SparkException("Error starting driver, DRIVER_ABORTED")) | ||
| markErr() | ||
| val ex = new SparkException("Error starting driver, DRIVER_ABORTED") | ||
| // if the driver gets aborted after the successful registration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/after the successful registration/after registration/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to simplify the code, can we just throw SparkExecption here? Then the catch will then handle all cases
|
Is it because MesosDriver actually threw an exception? |
|
MesosDriver doesn't throw any exception, it just returns with the value as Status.DRIVER_ABORTED. This code handles exceptions and throws if it gets Status.DRIVER_ABORTED during registration, once the registration completes there is no code to handle and will be skipped the status and also thread dies. |
|
Test build #70648 has finished for PR 13143 at commit
|
|
@tnachen Can you check this? |
|
What whole function is designed poorly. We need to totally change it instead of tacking this on. We shouldn't be calling |
|
Thanks @mgummelt for the feedback, will update the PR with the function rewrite. |
|
Test build #75907 has finished for PR 13143 at commit
|
|
Test build #75908 has finished for PR 13143 at commit
|
|
@ArtRand @susanxhuynh could please review this before we call for a merge? |
|
@skonto @susanxhuynh @devaraj-kavali are people still interested in this? I was just playing around with this code to clean up ZK state.. Would be happy to try this when I have a few cycles. |
|
@ArtRand I think this is still an issue which needs to be merged, do you have any observations with this PR? |
|
Hello @devaraj-kavali. Yes. I've been playing around with this because it's inconvenient to clean up ZK whenever you uninstall/reinstall the Dispatcher. The problem is that the only signal of a re-install vs. a failover is when Mesos gives you a |
|
Can one of the admins verify this patch? |
|
ping @devaraj-kavali for @ArtRand's comment above. |
Closes apache#17422 Closes apache#17619 Closes apache#18034 Closes apache#18229 Closes apache#18268 Closes apache#17973 Closes apache#18125 Closes apache#18918 Closes apache#19274 Closes apache#19456 Closes apache#19510 Closes apache#19420 Closes apache#20090 Closes apache#20177 Closes apache#20304 Closes apache#20319 Closes apache#20543 Closes apache#20437 Closes apache#21261 Closes apache#21726 Closes apache#14653 Closes apache#13143 Closes apache#17894 Closes apache#19758 Closes apache#12951 Closes apache#17092 Closes apache#21240 Closes apache#16910 Closes apache#12904 Closes apache#21731 Closes apache#21095 Added: Closes apache#19233 Closes apache#20100 Closes apache#21453 Closes apache#21455 Closes apache#18477 Added: Closes apache#21812 Closes apache#21787 Author: hyukjinkwon <gurwls223@apache.org> Closes apache#21781 from HyukjinKwon/closing-prs.
What changes were proposed in this pull request?
When the mesosDriver.run() returns with the status as DRIVER_ABORTED then throwing the exception which can be handled from SparkUncaughtExceptionHandler to shutdown the dispatcher.
How was this patch tested?
I verified it manually, the driver thread throws exception when mesosDriver.run() returns with the DRIVER_ABORTED status.