Skip to content

Conversation

@devaraj-kavali
Copy link

What changes were proposed in this pull request?

When the mesosDriver.run() returns with the status as DRIVER_ABORTED then throwing the exception which can be handled from SparkUncaughtExceptionHandler to shutdown the dispatcher.

How was this patch tested?

I verified it manually, the driver thread throws exception when mesosDriver.run() returns with the DRIVER_ABORTED status.

@tnachen
Copy link
Contributor

tnachen commented Jun 2, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59856 has finished for PR 13143 at commit c16fb5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

error = Some(new SparkException("Error starting driver, DRIVER_ABORTED"))
markErr()
val ex = new SparkException("Error starting driver, DRIVER_ABORTED")
// if the driver gets aborted after the successful registration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/after the successful registration/after registration/g

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to simplify the code, can we just throw SparkExecption here? Then the catch will then handle all cases

@tnachen
Copy link
Contributor

tnachen commented Jun 22, 2016

Is it because MesosDriver actually threw an exception?

@devaraj-kavali
Copy link
Author

devaraj-kavali commented Jun 22, 2016

MesosDriver doesn't throw any exception, it just returns with the value as Status.DRIVER_ABORTED.

  registerLatch.await()

  // propagate any error to the calling thread. This ensures that SparkContext creation fails
  // without leaving a broken context that won't be able to schedule any tasks
  error.foreach(throw _)

This code handles exceptions and throws if it gets Status.DRIVER_ABORTED during registration, once the registration completes there is no code to handle and will be skipped the status and also thread dies.

@SparkQA
Copy link

SparkQA commented Dec 28, 2016

Test build #70648 has finished for PR 13143 at commit 946ad7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@devaraj-kavali
Copy link
Author

@tnachen Can you check this?

@devaraj-kavali
Copy link
Author

@mgummelt /@tnachen, can you have a look into this?

@mgummelt
Copy link

What whole function is designed poorly. We need to totally change it instead of tacking this on. We shouldn't be calling driver.run() in a separate thread. We should be calling driver.start()

@devaraj-kavali
Copy link
Author

Thanks @mgummelt for the feedback, will update the PR with the function rewrite.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75907 has finished for PR 13143 at commit 131edc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75908 has finished for PR 13143 at commit b9893a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor

skonto commented Jul 27, 2017

@ArtRand @susanxhuynh could please review this before we call for a merge?

@ArtRand
Copy link

ArtRand commented Sep 21, 2017

@skonto @susanxhuynh @devaraj-kavali are people still interested in this? I was just playing around with this code to clean up ZK state.. Would be happy to try this when I have a few cycles.

@devaraj-kavali
Copy link
Author

@ArtRand I think this is still an issue which needs to be merged, do you have any observations with this PR?

@ArtRand
Copy link

ArtRand commented Sep 23, 2017

Hello @devaraj-kavali. Yes. I've been playing around with this because it's inconvenient to clean up ZK whenever you uninstall/reinstall the Dispatcher. The problem is that the only signal of a re-install vs. a failover is when Mesos gives you a DRIVER_ABORTED error (signals a re-install). I think a solution will involve having different methods to starting SchedulerDrivers for Spark Drivers (applications) and the Dispatcher because they inherently have different requirements with respect to state.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

ping @devaraj-kavali for @ArtRand's comment above.

@asfgit asfgit closed this in 1a4fda8 Jul 19, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#17422
Closes apache#17619
Closes apache#18034
Closes apache#18229
Closes apache#18268
Closes apache#17973
Closes apache#18125
Closes apache#18918
Closes apache#19274
Closes apache#19456
Closes apache#19510
Closes apache#19420
Closes apache#20090
Closes apache#20177
Closes apache#20304
Closes apache#20319
Closes apache#20543
Closes apache#20437
Closes apache#21261
Closes apache#21726
Closes apache#14653
Closes apache#13143
Closes apache#17894
Closes apache#19758
Closes apache#12951
Closes apache#17092
Closes apache#21240
Closes apache#16910
Closes apache#12904
Closes apache#21731
Closes apache#21095

Added:
Closes apache#19233
Closes apache#20100
Closes apache#21453
Closes apache#21455
Closes apache#18477

Added:
Closes apache#21812
Closes apache#21787

Author: hyukjinkwon <gurwls223@apache.org>

Closes apache#21781 from HyukjinKwon/closing-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants