[SPARK-29397][core] Extend plugin interface to include the driver. #26170

vanzin · 2019-10-18T21:31:31Z

Spark 2.4 added the ability for executor plugins to be loaded into
Spark (see SPARK-24918). That feature intentionally skipped the
driver to keep changes small, and also because it is possible to
load code into the Spark driver using listeners + configuration.

But that is a bit awkward, because the listener interface does not
provide hooks into a lot of Spark functionality. This change reworks
the executor plugin interface to also extend to the driver.

there's a "SparkPlugin" main interface that provides APIs to
load driver and executor components.
custom metric support (added in SPARK-28091) can be used by
plugins to register metrics both in the driver process and in
executors.
a communication channel now exists that allows the plugin's
executor components to send messages to the plugin's driver
component easily, using the existing Spark RPC system.

The latter was a feature intentionally left out of the original
plugin design (also because it didn't include a driver component).

To avoid polluting the "org.apache.spark" namespace, I added the new
interfaces to the "org.apache.spark.api" package, which seems like
a better place in any case. The actual implementation is kept in
an internal package.

The change includes unit tests for the new interface and features,
but I've also been running a custom plugin that extends the new
API in real applications.

Spark 2.4 added the ability for executor plugins to be loaded into Spark (see SPARK-24918). That feature intentionally skipped the driver to keep changes small, and also because it is possible to load code into the Spark driver using listeners + configuration. But that is a bit awkward, because the listener interface does not provide hooks into a lot of Spark functionality. This change reworks the executor plugin interface to also extend to the driver. - there's a "SparkPlugin" main interface that provides APIs to load driver and executor components. - custom metric support (added in SPARK-28091) can be used by plugins to register metrics both in the driver process and in executors. - a communication channel now exists that allows the plugin's executor components to send messages to the plugin's driver component easily, using the existing Spark RPC system. The latter was a feature intentionally left out of the original plugin design (also because it didn't include a driver component). To avoid polluting the "org.apache.spark" namespace, I added the new interfaces to the "org.apache.spark.api" package, which seems like a better place in any case. The actual implementation is kept in an internal package. The change includes unit tests for the new interface and features, but I've also been running a custom plugin that extends the new API in real applications.

SparkQA · 2019-10-18T21:49:48Z

Test build #112298 has finished for PR 26170 at commit 4cbe86e.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
.doc(\"Comma-separated list of class names implementing \" +
sealed abstract class PluginContainer
class PluginMetricsSource(
case class PluginMessage(pluginName: String, message: AnyRef)

SparkQA · 2019-10-19T00:45:12Z

Test build #112302 has finished for PR 26170 at commit 6a0cb1c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-19T04:30:11Z

Test build #112307 has finished for PR 26170 at commit 94329b2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-19T22:28:55Z

Test build #112322 has finished for PR 26170 at commit 7cc7536.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LucaCanali · 2019-10-22T10:07:32Z

I have run a quick test of this in local mode with a basic plugin. However it does not seem to work OK for me when I try it on a YARN test cluster.

tgravescs · 2019-10-22T12:49:49Z

core/src/main/java/org/apache/spark/api/plugin/DriverPlugin.java

+   * Initialize the plugin.
+   * <p>
+   * This method is called early in the initialization of the Spark driver. Explicitly, it is
+   * called before the application is submitted to the cluster manager. This means that a lot


before it is submitted? I assume that is client mode and not yarn cluster mode for instance?

I'll change; "before the task scheduler is initialized".

tgravescs · 2019-10-22T12:50:44Z

core/src/main/java/org/apache/spark/api/plugin/DriverPlugin.java

+   * @param sc The SparkContext loading the plugin.
+   * @param pluginContext Additional plugin-specific about the Spark application where the plugin
+   *                      is running.
+   * @return A map containing configuration data for the executor-side component of the plugin.


its not clear to me what is returned? do these need to be spark confs? I'm guessing not but would be good to clarify.

Can you suggest something? It already says this exact map is provided to the executor plugin's init method. I don't know how can I be clearer.

Perhaps just say the configuration keys are user defined. Are there any other formatting restrictions? like they shouldn't use spark. or special characters, etc.

I feel like trying to explain more just makes it more confusing.

You return a map. The map magically appears as an argument to the executor side's init method, with the exact contents you returned. Simple. Whatever you can put in that map will show up on the other side.

tgravescs · 2019-10-22T12:54:16Z

core/src/main/java/org/apache/spark/api/plugin/SparkPlugin.java

+ * Plugins can be loaded by adding the plugin's class name to the appropriate Spark configuration.
+ * Check the Spark configuration documentation for details.
+ * <p>
+ * Plugins have two components: a driver-side component, of which a single instance is created


perhaps add in two optional components since I think you can run one without the other

tgravescs · 2019-10-22T12:56:24Z

core/src/main/scala/org/apache/spark/SparkContext.scala

@@ -539,6 +541,9 @@ class SparkContext(config: SparkConf) extends Logging {
    _heartbeatReceiver = env.rpcEnv.setupEndpoint(
      HeartbeatReceiver.ENDPOINT_NAME, new HeartbeatReceiver(this))

+    // Initialize any plugins before the app is initialized in the cluster manager.


similar to comment above, this comment is a bit confusing to me perhaps we can clarify

tgravescs · 2019-10-22T13:05:32Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+
+  private[spark] val PLUGINS =
+    ConfigBuilder("spark.plugins")
+      .withPrepended(STATIC_PLUGINS_LIST, separator = ",")


its to bad we don't have documentation on withPrepended as it was not clear what it did. My initial thought was it was something to do with the actual config name. After investigating more and reading the code figured it out, but I guess that is a separate issue.

It seems to me we have a couple of these, this one named .static., the java options with defaultJavaOptions. It might be nice to keep a consistent naming theme if we are going to start supporting having cluster level ones and then user level ones and they aren't overrides, they are prepends.

I'll change to something but I've always disliked the "default" name. Since "default" means it can be overridden, and that's not the goal here.

thats fine, I agree with the "default" name, I was just thinking if we can come up with standard naming it would be nice.

tgravescs · 2019-10-22T13:06:43Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContainer.scala

+      }
+    }
+  }
+


remove extra newline

tgravescs · 2019-10-22T13:08:54Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContainer.scala

+
+object PluginContainer {
+
+  val EXTRA_CONF_PREFIX = "spark.plugins.__conf__."


instead of conf should we use like .internal. I assume these aren't meant for users to set directly

I've always used the underscores to mean "internal" since I started programming in C... but sure.

tgravescs · 2019-10-22T13:33:12Z

core/src/main/java/org/apache/spark/api/plugin/DriverPlugin.java

+   * Plugins can use Spark's RPC system to send messages from executors to the driver (but not
+   * the other way around, currently). Messages sent by the executor component of the plugin will
+   * be delivered to this method, and the returned value will be sent back to the executor as
+   * the reply, if the executor has requested one.


how does this function know if executor requested reply? I assume its up to them to infer from message type?

It's up to the plugin code. I'm trying to avoid exposing two methods to handle RPC messages.

tgravescs · 2019-10-22T13:38:16Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContainer.scala

+  override def shutdown(): Unit = {
+    driverPlugins.foreach { case (name, plugin) =>
+      try {
+        plugin.shutdown()


add debug message like executors have

Weird I have that message locally. Not sure how it's not here.

Err looked in the wrong spot.

vanzin · 2019-10-23T16:03:17Z

However it does not seem to work OK for me when I try it on a YARN test cluster.

Can you be a little more descriptive about what that means?

I've been testing on k8s and it was working fine.

vanzin · 2019-10-23T16:59:00Z

However it does not seem to work OK for me when I try it on a YARN test cluster.

I added a non-local unit test which passed without any modifications to the code; I also tried on yarn client mode with a dummy plugin, all seems fine. Don't see why anything would change in yarn cluster mode.

SparkQA · 2019-10-23T18:39:59Z

Test build #112551 has finished for PR 26170 at commit 5dd6afd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2019-10-23T19:06:04Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContainer.scala

+
+object PluginContainer {
+
+  val EXTRA_CONF_PREFIX = "spark.plugins.__internal_conf__."


sorry I was thinking just spark.plugins.internal.conf. We have the internal() option to config builder so figured it kind of matched. I don't have a super strong opinion on this as long as we try to keep it consistent. I know we use xxx for various internal things - files directories - but didn't think we had any for configs. thoughts?

The internal() thing does not modify the config names. Nor does it have any other effect aside from being informational (except for SQL configs, which are hidden from the "set -v" output).

So does it really matter what this name is? It's internal, it's not supposed to be set or read by anything other than internal Spark code, and people who end up seeing them should just ignore them.

I get that that internal() doesn't mean anything to the config name, maybe a bad comparison, its more of keeping naming consistent. To me its a lot more obvious if a config has .internal. in its name that its internal to spark. Users should ignore those. that is why I suggested it. If its .internal. I could also programmatically "grep" for all internal configs fairly easily. Not sure why I would want to do this, other than maybe hide them from user.

All the other spark configs either follow format x.y.z with the last one optionally camel case, so why not keep that consistent instead of breaking that convention with the something format. I know our internal configs now have no special name on them, which personally I don't like either as its not obvious its meant to be internal. The only benefit to that is you can easily change to not be internal if you want without changing the name.

Sure, if you want that, I don't care about the name. I'm just pointing out that there isn't a pattern to follow here.

SparkQA · 2019-10-23T19:23:07Z

Test build #112555 has finished for PR 26170 at commit e38a8da.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class NonLocalModeSparkPlugin extends SparkPlugin

LucaCanali · 2019-10-24T12:59:18Z

I have further investigated the issues I find when running on YARN. What I am trying to do is to add plugins packaged in a jar, using --jars to ship them to the executors. When running on a YARN cluster, the executors are not able to find the plugin class in my custom jar.
I see that the current executor plugin implementation has additional code to deal with the class loader and its context. I have tried to add that to this PR in a quick hack, and it seemed to work (for YARN).

vanzin · 2019-10-24T16:52:08Z

current executor plugin implementation has additional code to deal with the class loader

Ah, I see that. I'll do the same for the new code.

SparkQA · 2019-10-24T19:18:16Z

Test build #112613 has finished for PR 26170 at commit e04e0eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LucaCanali · 2019-10-24T19:57:02Z

I have just tested the latest updates and it works OK now, thanks.
BTW with k8s client mode it is not possible to send the plugins with --jars as with YARN (it does not work for spark.executor.plugins either), would there be an easy way to make it work?

vanzin · 2019-10-24T19:58:45Z

BTW with k8s client mode it is not possible to send the plugins with --jars

That's an issue with every cluster manager backend in Spark, except for YARN. Distribution of the plugins is up to the user / admin / etc in those cases.

In the case of k8s you could use custom docker images. Not sure if recent code added to stage deps in a shared file system could help.

LucaCanali · 2019-10-24T20:02:13Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContainer.scala

+        ctx.registerMetrics()
+
+        logInfo(s"Initialized executor component for plugin $name.")
+        Some(p.getClass().getName() -> executorPlugin)


With the current implementation of executor.plugins we use plugin.getClass().getSimpleName() instead of getName. The advantage of getSimpleName is that it is more compact, it does not have "." characters, so it is easy to process when handling metrics data + when using getName, we will have long names and they will be repated for all emitted values.

This was intentional. The full name has a much higher chance of being unique. I don't really see the advantages you mention; the dots don't make it any more complicated to process, and it's easy to get just the "simple name" if you want to, while the other way around is impossible.

Good point.

LucaCanali · 2019-10-24T20:02:58Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContainer.scala

+      }
+      ctx.registerMetrics()
+      logInfo(s"Initialized driver component for plugin $name.")
+      Some(p.getClass().getName() -> driverPlugin)


See comment about getName vs. getSimpleName below.

SparkQA · 2019-10-25T00:47:05Z

Test build #112626 has finished for PR 26170 at commit 2dd8dff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LucaCanali · 2019-10-28T13:50:19Z

core/src/main/scala/org/apache/spark/internal/plugin/PluginContextImpl.scala

+
+  def registerMetrics(): Unit = {
+    if (!registry.getMetrics().isEmpty()) {
+      val src = new PluginMetricsSource(pluginName, registry)


I have been experimenting with adding a prefix to pluginName, something like "Plugin." + pluginName.
This would have the advantage in our current setup to ease the use of plugin metrics with a graphite end point stored in InfluxDB. We currently do this with templates for InfluxDB. Current templates take the first entry in the measurement field list (separated by dots) as the sourceName/namespace value (DAGScheduler, BlockManager, JVMCPU, executor, etc), example: https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Dashboard/influxdb.conf_GRAPHITE
Another possible (mild?) advantage of adding a prefix to the source name, ahead of the class name, is that it would not allow a clash of plugin names with existing metrics namespaces.

LucaCanali · 2019-10-28T13:59:03Z

docs/monitoring.md

-  - Optional namespace(s). Metrics in this namespace are defined by user-supplied code, and 
-  configured using the Spark executor plugin infrastructure.
-  See also the configuration parameter `spark.executor.plugins`
+- namespace=<Plugin Class Name>


I think it could make sense to add a similar entry about plugin metrics to the driver componenet instance metrics list, similarly to what has been done for the namespace=JVMCPU, for example.
Please add < to escape the "<" character (this issue actually comes from SPARK-20891)

SparkQA · 2019-10-28T17:25:17Z

Test build #112784 has started for PR 26170 at commit e12eedf.

LucaCanali · 2019-10-28T20:29:47Z

docs/monitoring.md

-  - Optional namespace(s). Metrics in this namespace are defined by user-supplied code, and 
-  configured using the Spark executor plugin infrastructure.
-  See also the configuration parameter `spark.executor.plugins`
+- namespace=plugin.<Plugin Class Name>


I believe escape characters are needed for this, something like: namespace=plugin.\<Plugin Class Name>

SparkQA · 2019-10-28T23:19:17Z

Test build #112798 has finished for PR 26170 at commit 8352e28.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LucaCanali · 2019-10-29T07:10:48Z

LGTM. Thanks @vanzin for all the work.

tgravescs

changes look good.
just curious if you tested with just an executor plugin? I guess you have another jira to deal with the old api.

tgravescs · 2019-10-29T14:02:29Z

core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala

+    assert(err.getMessage().contains("unknown message"))
+
+    // It should be possible for the driver plugin to send a message to itself, even if that doesn't
+    // make a whole lot of sense. It at least allows the same context class to be used on both


this answers my question about this. I wasn't sure it made much sense but seems ok for reuse. Was wondering if you had a specific use case in mind but sounds like not.

vanzin · 2019-10-29T16:12:16Z

just curious if you tested with just an executor plugin?

I haven't explicitly, no, but also don't see what would not work. Trying to send messages to the driver would not work (and in the case of send you wouldn't even know), but it seems not worth it to do something better in that case.

SparkQA · 2019-10-29T16:41:03Z

Test build #112857 has finished for PR 26170 at commit 69600e6.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-29T18:58:16Z

Test build #112858 has finished for PR 26170 at commit 37ad680.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LucaCanali · 2019-10-30T20:07:43Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

@@ -165,6 +166,11 @@ private[spark] class Executor(
    }
  }

+  // Plugins need to load using a class loader that includes the executor's user classpath
+  private val plugins: Option[PluginContainer] = Utils.withContextClassLoader(replClassLoader) {


I am not sure that plugins should be loaded at this stage when running in local mode, maybe only the driver side of the plugin is sufficient in local mode?
Metrics source registration at this stage, when executed in local mode, will not get the application id. Other metrics handled in executors.scala are not registered when running in local mode.
The current implementation of executor plugins sends a "isLocal boolean" via the pluginContext to handle this case in the plugin logic.

You can check if you're running in local mode by looking at the spark.master value. I intentionally did not add to the API since it would be redundant.

I'm also not especially worried about local mode. It's mostly for debugging. If something doesn't work 100% as intended for plugins, I'm totally fine with it.

Indeed, that should be fine.

vanzin · 2019-11-04T18:27:55Z

retest this please

SparkQA · 2019-11-04T21:08:56Z

Test build #113221 has finished for PR 26170 at commit 37ad680.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2019-11-04T22:32:46Z

No more comments, merging to master.

HyukjinKwon · 2019-11-07T01:55:17Z

@vancin, BTW why do we have the same API in two places:

https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java
https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/ExecutorPlugin.java

? Is it for compatibility? if so, I think we should remove (https://github.com/apache/spark/pull/22192/files#r343418878) or deprecate the other.

tgravescs · 2019-11-07T14:44:00Z

see the jira - it has a separate jira to remove the old one:
https://issues.apache.org/jira/browse/SPARK-29396

vanzin · 2019-11-07T16:51:09Z

See #26390

HyukjinKwon · 2019-11-08T01:57:12Z

Ah, thanks guys.

Madhukar525722 · 2025-04-21T20:37:06Z

Hi @vanzin @HyukjinKwon
There is a statement
When a Spark plugin provides an executor plugin, this method will be called during the initialization of the executor process. It will block executor initialization until it returns.

When I see the implementation of Executor.scala , I think it gets started and send heartbeater.start(), then only plugin gets loaded. Which implies that Executor instance is getting started and sending acknowledgement to the driver, then executor plugin init is getting triggered.
Could you please confirm and rectify the statement, and could you please guide me through to approach my use case, where I want to intercept the spark.executor.memory conf provided by the user and modify it.

Javadoc fix.

6a0cb1c

Grammar.

94329b2

Grammar.

7cc7536

dongjoon-hyun added the SPARK CORE label Oct 19, 2019

tgravescs reviewed Oct 22, 2019

View reviewed changes

Marcelo Vanzin added 3 commits October 23, 2019 09:05

Feedback.

5dd6afd

Missing log.

ccb950e

Add non-local test.

e38a8da

tgravescs reviewed Oct 23, 2019

View reviewed changes

Marcelo Vanzin added 4 commits October 24, 2019 09:53

Use the executor class loader when loading plugins.

76a5b0d

Config names...

2c81c31

Simplify a comment.

6b59f5c

More config name.

e04e0eb

LucaCanali reviewed Oct 24, 2019

View reviewed changes

Wait until later to register driver metrics.

2dd8dff

LucaCanali reviewed Oct 28, 2019

View reviewed changes

Feedback.

e12eedf

LucaCanali reviewed Oct 28, 2019

View reviewed changes

Doc syntax.

8352e28

LucaCanali mentioned this pull request Oct 29, 2019

How to get the plugin class to be visible to executors cerndb/SparkExecutorPlugins2.4#1

Closed

tgravescs approved these changes Oct 29, 2019

View reviewed changes

Add blurb about jar distribution (or lack thereof).

69600e6

Grammar.

37ad680

LucaCanali reviewed Oct 30, 2019

View reviewed changes

vanzin closed this in d51d228 Nov 4, 2019

vanzin deleted the SPARK-29397 branch November 4, 2019 22:33

fsamuel-bs mentioned this pull request Oct 8, 2020

[SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events #29977

Closed


		object PluginContainer {

		val EXTRA_CONF_PREFIX = "spark.plugins.__conf__."

[SPARK-29397][core] Extend plugin interface to include the driver. #26170

[SPARK-29397][core] Extend plugin interface to include the driver. #26170

Uh oh!

Conversation

vanzin commented Oct 18, 2019

Uh oh!

SparkQA commented Oct 18, 2019

Uh oh!

SparkQA commented Oct 19, 2019

Uh oh!

SparkQA commented Oct 19, 2019

Uh oh!

SparkQA commented Oct 19, 2019

Uh oh!

LucaCanali commented Oct 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin commented Oct 23, 2019

Uh oh!

vanzin commented Oct 23, 2019

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin Oct 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

LucaCanali commented Oct 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanzin commented Oct 24, 2019

Uh oh!

SparkQA commented Oct 24, 2019

vanzin Oct 23, 2019 •

edited

Loading

LucaCanali commented Oct 24, 2019 •

edited

Loading

LucaCanali Oct 28, 2019 •

edited

Loading