[SPARK-24918][Core] Executor Plugin api #21923

squito · 2018-07-30T21:18:46Z

This provides a very simple api for users to specify arbitrary code to
run within an executor, eg. for debugging or added instrumentation. The
intial api is very simple, but the interface can be extended later, with
default methods, to help forward-compatibility.

This provides a very simple api for users to specify arbitrary code to run within an executor, eg. for debugging or added instrumentation. The intial api is very simple, but creates an abstract base class to allow future additions.

holdensmagicalunicorn · 2018-07-30T21:18:50Z

@squito, thanks! I am a bot who has found some folks who might be able to help with the review:@cloud-fan, @vanzin and @rxin

rxin · 2018-07-30T23:07:15Z

Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases.

SparkQA · 2018-07-31T02:23:22Z

Test build #93806 has finished for PR 21923 at commit ba6aa6c.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class AbstractExecutorPlugin
.doc(\"Comma-separated list of class names for \"plugins\" implementing \" +

felixcheung · 2018-08-01T07:40:44Z

this https://github.com/squito/spark-memory

tgravescs · 2018-08-01T14:36:30Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

@@ -130,6 +130,12 @@ private[spark] class Executor(
  private val urlClassLoader = createClassLoader()
  private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)

+  Thread.currentThread().setContextClassLoader(replClassLoader)


do we want to do it in the same thread? It might be safer in a separate thread. Does that affect your memory monitor?

My memory monitor would be fine if the constructor were called in another thread. (It actually creates its own thread -- it has to, as its going to continually poll.)

What would be the advantage to calling the constructor in a separate thread? If its just protect against exceptions, we could just do a try/catch. If its to ensure that we don't tie up the main executor threads ... well, even in another thread, the plugin could do something arbitrary to tie up all the resources associated with this executor (eg. launch 30 threads and do something intensive in each one).

Not opposed to having another thread, just want to understand why.

I was thinking another thread would at least prevent them from not allowing the executor to run/start. If someone added a plugin that just blocked or did something that took time and then you started to see timeouts during start, those might not be as obvious what is going on. If we start it in a separate thread, yes it still uses resources but it doesn't completely block the executor from starting and trying to take tasks. It also just seems safer to me as you could try to catch exceptions from there and possibly ignore them so it doesn't affect the main running.

makes sense. probably something I should have at least a basic test for as well ... will need to think about how to do that.

tgravescs · 2018-08-01T14:36:40Z

the only other cases I have heard people ask for this are hopefully covered by the new barrier scheduling.
While I agree that we need to be careful about creating APIs to properly think about them, I don't think we should block a useful feature on having more than one use case. We just need to make sure its mark properly until we are comfortable with it. This is marked as DeveloperApi which should cover that.

Currently this isn't putting any user docs up which might make sense if our use case is debugging and we want to try to vet this as an alpha api. thoughts?

squito · 2018-08-01T18:28:08Z

Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases.

With this basic api, you could just do things that tie into the JVM in general. For example, you can inspect memory or get thread dumps.

We could add an event for executor shutdown, if you wanted to cleanup any shared resources. I haven't had a need for this, but I think this is something I've heard requests for in the past.

I have another variant of this where you also get task start and end events. This lets you control the monitoring a little more -- eg., I had something which just started polling thread dumps only if there was a task from stage 17 that had been taking longer than 5 seconds. But anything task related is a bit trickier to decide the right api. Shoudl the task end event also get the failure reason? Should those events get called in the same thread as the task runner, or in another thread? Again, DeveloperApi gives us flexibility to change those particulars down the road, but I didn't feel strongly about getting them in right now.

Currently this isn't putting any user docs up which might make sense if our use case is debugging and we want to try to vet this as an alpha api. thoughts?

I feel like we should leave it undocumented at first, just because I worry about the average user not knowing what to do with it (or doing something they really shouldn't be). But I don't feel super strongly about it.

mccheah · 2018-08-02T00:38:15Z

core/src/main/java/org/apache/spark/AbstractExecutorPlugin.java

+ * could also intefere with task execution and make the executor fail in unexpected ways.
+ */
+@DeveloperApi
+public class AbstractExecutorPlugin {


interface? A bit more flexibility in terms of the user's desired class hierarchy.

I chose an abstract base class so that its easier to guarantee forward-compatibility when we add new methods (like SparkFirehoseListener). Could make an interface as well, of course, but thought this would steer users to the better choice.

If we're adding methods, it's the same amount of work for users if we have this be an interface - the user will still have to override the added methods. If we're adding default methods, we can use the default keyword.

Abstract class would make sense if we had fields / properties we want to make universal across all these plugins, but it's unclear that's a needed API.

I can imagine a use case for example being that a user wants to port their agent that they've already written to be loadable via this executor plugin. If that agent has already been written with some class hierarchy it's easier to tag on implements ExecutorPlugin than to rewrite the class hierarchy (given that Java doesn't support multiple inheritance of classes).

If we do want to keep this as an abstract class, I believe we're missing the abstract keyword right now.

yes, all good points, I forgot about default methods in interfaces. (and also, yes I even forgot abstract even in this version.)

squito · 2018-08-02T16:01:09Z

retest this please

SparkQA · 2018-08-02T17:19:41Z

Test build #94032 has finished for PR 21923 at commit f8c99e3.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-02T17:34:54Z

Test build #94034 has finished for PR 21923 at commit 7d43c77.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-02T21:03:59Z

Test build #94023 has finished for PR 21923 at commit ba6aa6c.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class AbstractExecutorPlugin
.doc(\"Comma-separated list of class names for \"plugins\" implementing \" +

SparkQA · 2018-08-02T23:03:39Z

Test build #94041 has finished for PR 21923 at commit 3297195.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-08-05T21:43:41Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+    ConfigBuilder("spark.executor.plugins")
+      .internal()
+      .doc("Comma-separated list of class names for \"plugins\" implementing " +
+        "org.apache.spark.AbstractExecutorPlugin.  Plugins have the same privileges as any task " +


org.apache.spark.AbstractExecutorPlugin -> org.apache.spark.ExecutorPlugin.

SparkQA · 2018-08-06T22:18:28Z

Test build #94296 has finished for PR 21923 at commit 8fb739b.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2018-08-14T01:19:41Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

@@ -130,6 +130,12 @@ private[spark] class Executor(
  private val urlClassLoader = createClassLoader()
  private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)

+  Thread.currentThread().setContextClassLoader(replClassLoader)
+  conf.get(EXECUTOR_PLUGINS).foreach { classes =>
+    Utils.loadExtensions(classOf[ExecutorPlugin], classes, conf)


For all cluster managers would this properly load plugins deployed via --jars in spark-submit or spark.jars in the SparkConf? I know that jar deployment and when they're available on the classpath may sometimes vary. Although worst case this seems like the kind of thing one may prefer to put in spark.executor.extraClassPath simply because those jars are guaranteed to be loaded at JVM boot time.

In fact - I wonder if we should even move this extension loading further up in the lifecycle, simply so that the plugin can be around for a larger percentage of the executor JVM's uptime.

squito · 2018-08-23T03:42:49Z

this is being continued in #22192

…n task start and end events ### What changes were proposed in this pull request? Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. ### Why are the changes needed? Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in #21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. ### Does this PR introduce _any_ user-facing change? No. This PR introduces new features for future developers to use. ### How was this patch tested? Unit tests on `PluginContainerSuite`. Closes #29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

[SPARK-24918][Core] Executor Plugin api

ba6aa6c

This provides a very simple api for users to specify arbitrary code to run within an executor, eg. for debugging or added instrumentation. The intial api is very simple, but creates an abstract base class to allow future additions.

tgravescs reviewed Aug 1, 2018

View reviewed changes

mccheah reviewed Aug 2, 2018

View reviewed changes

switch to interface

f8c99e3

fix

7d43c77

fix

3297195

dongjoon-hyun reviewed Aug 5, 2018

View reviewed changes

doc update

8fb739b

HyukjinKwon mentioned this pull request Aug 7, 2018

[SPARK-24886][INFRA] Fix the testing script to increase timeout for Jenkins build (from 300m to 340m) #21845

Closed

mccheah reviewed Aug 14, 2018

View reviewed changes

squito closed this Aug 23, 2018

NiharS mentioned this pull request Aug 24, 2018

[SPARK-24918][Core] Executor Plugin API #22192

Closed

fsamuel-bs mentioned this pull request Oct 8, 2020

[SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events #29977

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-24918][Core] Executor Plugin api #21923

[SPARK-24918][Core] Executor Plugin api #21923

squito commented Jul 30, 2018 •

edited

Loading

holdensmagicalunicorn commented Jul 30, 2018

rxin commented Jul 30, 2018

SparkQA commented Jul 31, 2018

felixcheung commented Aug 1, 2018

tgravescs Aug 1, 2018

squito Aug 1, 2018

tgravescs Aug 2, 2018

squito Aug 2, 2018

tgravescs commented Aug 1, 2018

squito commented Aug 1, 2018

mccheah Aug 2, 2018

squito Aug 2, 2018

mccheah Aug 2, 2018

squito Aug 2, 2018

squito commented Aug 2, 2018

SparkQA commented Aug 2, 2018

SparkQA commented Aug 2, 2018

SparkQA commented Aug 2, 2018

SparkQA commented Aug 2, 2018

dongjoon-hyun Aug 5, 2018

SparkQA commented Aug 6, 2018

mccheah Aug 14, 2018

squito commented Aug 23, 2018

[SPARK-24918][Core] Executor Plugin api #21923

[SPARK-24918][Core] Executor Plugin api #21923

Conversation

squito commented Jul 30, 2018 • edited Loading

holdensmagicalunicorn commented Jul 30, 2018

rxin commented Jul 30, 2018

SparkQA commented Jul 31, 2018

felixcheung commented Aug 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Aug 1, 2018

squito commented Aug 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squito commented Aug 2, 2018

SparkQA commented Aug 2, 2018

SparkQA commented Aug 2, 2018

SparkQA commented Aug 2, 2018

SparkQA commented Aug 2, 2018

Choose a reason for hiding this comment

SparkQA commented Aug 6, 2018

Choose a reason for hiding this comment

squito commented Aug 23, 2018

squito commented Jul 30, 2018 •

edited

Loading