Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24918][Core] Executor Plugin api #21923

Closed
wants to merge 5 commits into from
Closed

Conversation

squito
Copy link
Contributor

@squito squito commented Jul 30, 2018

This provides a very simple api for users to specify arbitrary code to
run within an executor, eg. for debugging or added instrumentation. The
intial api is very simple, but the interface can be extended later, with
default methods, to help forward-compatibility.

This provides a very simple api for users to specify arbitrary code to
run within an executor, eg. for debugging or added instrumentation.  The
intial api is very simple, but creates an abstract base class to allow
future additions.
@holdensmagicalunicorn
Copy link

@squito, thanks! I am a bot who has found some folks who might be able to help with the review:@cloud-fan, @vanzin and @rxin

@rxin
Copy link
Contributor

rxin commented Jul 30, 2018

Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases.

@SparkQA
Copy link

SparkQA commented Jul 31, 2018

Test build #93806 has finished for PR 21923 at commit ba6aa6c.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class AbstractExecutorPlugin
  • .doc(\"Comma-separated list of class names for \"plugins\" implementing \" +

@felixcheung
Copy link
Member

@@ -130,6 +130,12 @@ private[spark] class Executor(
private val urlClassLoader = createClassLoader()
private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)

Thread.currentThread().setContextClassLoader(replClassLoader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to do it in the same thread? It might be safer in a separate thread. Does that affect your memory monitor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory monitor would be fine if the constructor were called in another thread. (It actually creates its own thread -- it has to, as its going to continually poll.)

What would be the advantage to calling the constructor in a separate thread? If its just protect against exceptions, we could just do a try/catch. If its to ensure that we don't tie up the main executor threads ... well, even in another thread, the plugin could do something arbitrary to tie up all the resources associated with this executor (eg. launch 30 threads and do something intensive in each one).

Not opposed to having another thread, just want to understand why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking another thread would at least prevent them from not allowing the executor to run/start. If someone added a plugin that just blocked or did something that took time and then you started to see timeouts during start, those might not be as obvious what is going on. If we start it in a separate thread, yes it still uses resources but it doesn't completely block the executor from starting and trying to take tasks. It also just seems safer to me as you could try to catch exceptions from there and possibly ignore them so it doesn't affect the main running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. probably something I should have at least a basic test for as well ... will need to think about how to do that.

@tgravescs
Copy link
Contributor

the only other cases I have heard people ask for this are hopefully covered by the new barrier scheduling.
While I agree that we need to be careful about creating APIs to properly think about them, I don't think we should block a useful feature on having more than one use case. We just need to make sure its mark properly until we are comfortable with it. This is marked as DeveloperApi which should cover that.

Currently this isn't putting any user docs up which might make sense if our use case is debugging and we want to try to vet this as an alpha api. thoughts?

@squito
Copy link
Contributor Author

squito commented Aug 1, 2018

Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases.

With this basic api, you could just do things that tie into the JVM in general. For example, you can inspect memory or get thread dumps.

We could add an event for executor shutdown, if you wanted to cleanup any shared resources. I haven't had a need for this, but I think this is something I've heard requests for in the past.

I have another variant of this where you also get task start and end events. This lets you control the monitoring a little more -- eg., I had something which just started polling thread dumps only if there was a task from stage 17 that had been taking longer than 5 seconds. But anything task related is a bit trickier to decide the right api. Shoudl the task end event also get the failure reason? Should those events get called in the same thread as the task runner, or in another thread? Again, DeveloperApi gives us flexibility to change those particulars down the road, but I didn't feel strongly about getting them in right now.

Currently this isn't putting any user docs up which might make sense if our use case is debugging and we want to try to vet this as an alpha api. thoughts?

I feel like we should leave it undocumented at first, just because I worry about the average user not knowing what to do with it (or doing something they really shouldn't be). But I don't feel super strongly about it.

* could also intefere with task execution and make the executor fail in unexpected ways.
*/
@DeveloperApi
public class AbstractExecutorPlugin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface? A bit more flexibility in terms of the user's desired class hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose an abstract base class so that its easier to guarantee forward-compatibility when we add new methods (like SparkFirehoseListener). Could make an interface as well, of course, but thought this would steer users to the better choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're adding methods, it's the same amount of work for users if we have this be an interface - the user will still have to override the added methods. If we're adding default methods, we can use the default keyword.

Abstract class would make sense if we had fields / properties we want to make universal across all these plugins, but it's unclear that's a needed API.

I can imagine a use case for example being that a user wants to port their agent that they've already written to be loadable via this executor plugin. If that agent has already been written with some class hierarchy it's easier to tag on implements ExecutorPlugin than to rewrite the class hierarchy (given that Java doesn't support multiple inheritance of classes).

If we do want to keep this as an abstract class, I believe we're missing the abstract keyword right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all good points, I forgot about default methods in interfaces. (and also, yes I even forgot abstract even in this version.)

@squito
Copy link
Contributor Author

squito commented Aug 2, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94032 has finished for PR 21923 at commit f8c99e3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94034 has finished for PR 21923 at commit 7d43c77.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94023 has finished for PR 21923 at commit ba6aa6c.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class AbstractExecutorPlugin
  • .doc(\"Comma-separated list of class names for \"plugins\" implementing \" +

@SparkQA
Copy link

SparkQA commented Aug 2, 2018

Test build #94041 has finished for PR 21923 at commit 3297195.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

ConfigBuilder("spark.executor.plugins")
.internal()
.doc("Comma-separated list of class names for \"plugins\" implementing " +
"org.apache.spark.AbstractExecutorPlugin. Plugins have the same privileges as any task " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.spark.AbstractExecutorPlugin -> org.apache.spark.ExecutorPlugin.

@SparkQA
Copy link

SparkQA commented Aug 6, 2018

Test build #94296 has finished for PR 21923 at commit 8fb739b.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -130,6 +130,12 @@ private[spark] class Executor(
private val urlClassLoader = createClassLoader()
private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)

Thread.currentThread().setContextClassLoader(replClassLoader)
conf.get(EXECUTOR_PLUGINS).foreach { classes =>
Utils.loadExtensions(classOf[ExecutorPlugin], classes, conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all cluster managers would this properly load plugins deployed via --jars in spark-submit or spark.jars in the SparkConf? I know that jar deployment and when they're available on the classpath may sometimes vary. Although worst case this seems like the kind of thing one may prefer to put in spark.executor.extraClassPath simply because those jars are guaranteed to be loaded at JVM boot time.

In fact - I wonder if we should even move this extension loading further up in the lifecycle, simply so that the plugin can be around for a larger percentage of the executor JVM's uptime.

@squito
Copy link
Contributor Author

squito commented Aug 23, 2018

this is being continued in #22192

@squito squito closed this Aug 23, 2018
asfgit pushed a commit that referenced this pull request Oct 16, 2020
…n task start and end events

### What changes were proposed in this pull request?
Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

### Why are the changes needed?
Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in #21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

### Does this PR introduce _any_ user-facing change?
No. This PR introduces new features for future developers to use.

### How was this patch tested?
Unit tests on `PluginContainerSuite`.

Closes #29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Nov 26, 2020
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
jdcasale pushed a commit to palantir/spark that referenced this pull request Jan 15, 2021
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Feb 25, 2021
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Feb 26, 2021
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Feb 26, 2021
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
jdcasale pushed a commit to palantir/spark that referenced this pull request Mar 3, 2021
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Mar 4, 2021
…n task start and end events

Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now.

Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to:

1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented.
2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced.
  a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks.
  b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends.

No. This PR introduces new features for future developers to use.

Unit tests on `PluginContainerSuite`.

Closes apache#29977 from fsamuel-bs/SPARK-33088.

Authored-by: Samuel Souza <ssouza@palantir.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants