-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24918][Core] Executor Plugin api #21923
Conversation
This provides a very simple api for users to specify arbitrary code to run within an executor, eg. for debugging or added instrumentation. The intial api is very simple, but creates an abstract base class to allow future additions.
@squito, thanks! I am a bot who has found some folks who might be able to help with the review:@cloud-fan, @vanzin and @rxin |
Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases. |
Test build #93806 has finished for PR 21923 at commit
|
@@ -130,6 +130,12 @@ private[spark] class Executor( | |||
private val urlClassLoader = createClassLoader() | |||
private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader) | |||
|
|||
Thread.currentThread().setContextClassLoader(replClassLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to do it in the same thread? It might be safer in a separate thread. Does that affect your memory monitor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My memory monitor would be fine if the constructor were called in another thread. (It actually creates its own thread -- it has to, as its going to continually poll.)
What would be the advantage to calling the constructor in a separate thread? If its just protect against exceptions, we could just do a try/catch. If its to ensure that we don't tie up the main executor threads ... well, even in another thread, the plugin could do something arbitrary to tie up all the resources associated with this executor (eg. launch 30 threads and do something intensive in each one).
Not opposed to having another thread, just want to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking another thread would at least prevent them from not allowing the executor to run/start. If someone added a plugin that just blocked or did something that took time and then you started to see timeouts during start, those might not be as obvious what is going on. If we start it in a separate thread, yes it still uses resources but it doesn't completely block the executor from starting and trying to take tasks. It also just seems safer to me as you could try to catch exceptions from there and possibly ignore them so it doesn't affect the main running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. probably something I should have at least a basic test for as well ... will need to think about how to do that.
the only other cases I have heard people ask for this are hopefully covered by the new barrier scheduling. Currently this isn't putting any user docs up which might make sense if our use case is debugging and we want to try to vet this as an alpha api. thoughts? |
With this basic api, you could just do things that tie into the JVM in general. For example, you can inspect memory or get thread dumps. We could add an event for executor shutdown, if you wanted to cleanup any shared resources. I haven't had a need for this, but I think this is something I've heard requests for in the past. I have another variant of this where you also get task start and end events. This lets you control the monitoring a little more -- eg., I had something which just started polling thread dumps only if there was a task from stage 17 that had been taking longer than 5 seconds. But anything task related is a bit trickier to decide the right api. Shoudl the task end event also get the failure reason? Should those events get called in the same thread as the task runner, or in another thread? Again, DeveloperApi gives us flexibility to change those particulars down the road, but I didn't feel strongly about getting them in right now.
I feel like we should leave it undocumented at first, just because I worry about the average user not knowing what to do with it (or doing something they really shouldn't be). But I don't feel super strongly about it. |
* could also intefere with task execution and make the executor fail in unexpected ways. | ||
*/ | ||
@DeveloperApi | ||
public class AbstractExecutorPlugin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface
? A bit more flexibility in terms of the user's desired class hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose an abstract base class so that its easier to guarantee forward-compatibility when we add new methods (like SparkFirehoseListener
). Could make an interface as well, of course, but thought this would steer users to the better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're adding methods, it's the same amount of work for users if we have this be an interface - the user will still have to override the added methods. If we're adding default methods, we can use the default
keyword.
Abstract class would make sense if we had fields / properties we want to make universal across all these plugins, but it's unclear that's a needed API.
I can imagine a use case for example being that a user wants to port their agent that they've already written to be loadable via this executor plugin. If that agent has already been written with some class hierarchy it's easier to tag on implements ExecutorPlugin
than to rewrite the class hierarchy (given that Java doesn't support multiple inheritance of classes).
If we do want to keep this as an abstract class, I believe we're missing the abstract
keyword right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, all good points, I forgot about default methods in interfaces. (and also, yes I even forgot abstract
even in this version.)
retest this please |
Test build #94032 has finished for PR 21923 at commit
|
Test build #94034 has finished for PR 21923 at commit
|
Test build #94023 has finished for PR 21923 at commit
|
Test build #94041 has finished for PR 21923 at commit
|
ConfigBuilder("spark.executor.plugins") | ||
.internal() | ||
.doc("Comma-separated list of class names for \"plugins\" implementing " + | ||
"org.apache.spark.AbstractExecutorPlugin. Plugins have the same privileges as any task " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.apache.spark.AbstractExecutorPlugin
-> org.apache.spark.ExecutorPlugin
.
Test build #94296 has finished for PR 21923 at commit
|
@@ -130,6 +130,12 @@ private[spark] class Executor( | |||
private val urlClassLoader = createClassLoader() | |||
private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader) | |||
|
|||
Thread.currentThread().setContextClassLoader(replClassLoader) | |||
conf.get(EXECUTOR_PLUGINS).foreach { classes => | |||
Utils.loadExtensions(classOf[ExecutorPlugin], classes, conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all cluster managers would this properly load plugins deployed via --jars
in spark-submit or spark.jars
in the SparkConf? I know that jar deployment and when they're available on the classpath may sometimes vary. Although worst case this seems like the kind of thing one may prefer to put in spark.executor.extraClassPath
simply because those jars are guaranteed to be loaded at JVM boot time.
In fact - I wonder if we should even move this extension loading further up in the lifecycle, simply so that the plugin can be around for a larger percentage of the executor JVM's uptime.
this is being continued in #22192 |
…n task start and end events ### What changes were proposed in this pull request? Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. ### Why are the changes needed? Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in #21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. ### Does this PR introduce _any_ user-facing change? No. This PR introduces new features for future developers to use. ### How was this patch tested? Unit tests on `PluginContainerSuite`. Closes #29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…n task start and end events Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in apache#21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. No. This PR introduces new features for future developers to use. Unit tests on `PluginContainerSuite`. Closes apache#29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza <ssouza@palantir.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
This provides a very simple api for users to specify arbitrary code to
run within an executor, eg. for debugging or added instrumentation. The
intial api is very simple, but the interface can be extended later, with
default methods, to help forward-compatibility.