-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11421][CORE][PYTHON][R] Added ability for addJar to augment the current classloader #19643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ab4b6b1
3c32174
a62367f
b928ab8
f839240
ab52809
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -319,6 +319,32 @@ spark.addFile <- function(path, recursive = FALSE) { | |
| invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) | ||
| } | ||
|
|
||
| #' Adds a JAR dependency for Spark tasks to be executed in the future. | ||
| #' | ||
| #' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| #' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| #' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' class loader | ||
| #' in the backing JVM. In general adding to the current threads' class loader will impact all | ||
| #' other application threads unless they have explicitly changed their class loader. | ||
| #' | ||
| #' Note: \code{addToCurrentClassLoader} parameter is a developer API, which change or be removed | ||
| #' in minor versions of Spark. | ||
| #' | ||
| #' @rdname spark.addJar | ||
| #' @param path The path of the jar to be added | ||
| #' @param addToCurrentClassLoader Whether to add the jar to the current driver class loader. | ||
| #' @export | ||
| #' @examples | ||
| #'\dontrun{ | ||
| #' spark.addJar("/path/to/something.jar", TRUE) | ||
| #'} | ||
| #' @note spark.addJar since 2.3.0 | ||
| spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { | ||
| normalizedPath <- suppressWarnings(normalizePath(path)) | ||
|
||
| sc <- callJMethod(getSparkContext(), "sc") | ||
| invisible(callJMethod(sc, "addJar", normalizedPath, addToCurrentClassLoader)) | ||
| } | ||
|
|
||
| #' Get the root directory that contains files added through spark.addFile. | ||
| #' | ||
| #' @rdname spark.getSparkFilesRootDirectory | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1802,7 +1802,21 @@ class SparkContext(config: SparkConf) extends Logging { | |
| * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), | ||
| * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| */ | ||
| def addJar(path: String) { | ||
| def addJar(path: String): Unit = { | ||
| addJar(path, addToCurrentClassLoader = false) | ||
| } | ||
|
|
||
| /** | ||
| * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. | ||
| * | ||
| * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), | ||
| * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| * @param addToCurrentClassLoader if true will add the jar to the current threads' class loader. | ||
| * In general adding to the current threads' class loader will impact all other application | ||
| * threads unless they have explicitly changed their class loader. | ||
| */ | ||
| @DeveloperApi | ||
| def addJar(path: String, addToCurrentClassLoader: Boolean): Unit = { | ||
| def addJarFile(file: File): String = { | ||
| try { | ||
| if (!file.exists()) { | ||
|
|
@@ -1838,12 +1852,21 @@ class SparkContext(config: SparkConf) extends Logging { | |
| case _ => path | ||
| } | ||
| } | ||
|
|
||
| if (key != null) { | ||
| val timestamp = System.currentTimeMillis | ||
| if (addedJars.putIfAbsent(key, timestamp).isEmpty) { | ||
| logInfo(s"Added JAR $path at $key with timestamp $timestamp") | ||
| postEnvironmentUpdate() | ||
| } | ||
|
|
||
| if (addToCurrentClassLoader) { | ||
| Utils.getContextOrSparkClassLoader match { | ||
| case cl: MutableURLClassLoader => cl.addURL(Utils.resolveURI(path).toURL) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure does it support remote jars on HTTPS or Hadoop FileSystems?In the executor side, we handle this explicitly by downloading jars to local and add to classpath, but here looks like we don't have such logic. I'm not sure how this The
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @jerryshao. Will check through this concern within this weekend and be back. |
||
| case cl => logWarning( | ||
| s"Unsupported class loader $cl will not update jars in the thread class loader.") | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -860,6 +860,23 @@ def addPyFile(self, path): | |
| import importlib | ||
| importlib.invalidate_caches() | ||
|
|
||
| def addJar(self, path, addToCurrentClassLoader=False): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should mention that adding a jar to the current class loader is a developer API and may change. |
||
| """ | ||
| Adds a JAR dependency for Spark tasks to be executed in the future. | ||
| The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| If `addToCurrentClassLoader` is true, add the jar to the current threads' class loader | ||
| in the backing JVM. In general adding to the current threads' class loader will impact all | ||
| other application threads unless they have explicitly changed their class loader. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @holdenk and @felixcheung, here I just added the comments back. I thought it's a developer API and might be fine to describe some words related with JVM but .. please let me know if you guys feel we need to take out.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we currently use |
||
|
|
||
| .. note:: `addToCurrentClassLoader` parameter is a developer API, which change or be removed | ||
| in minor versions of Spark. | ||
|
|
||
| :param path: The path of the jar to be added | ||
| :param addToCurrentClassLoader: Whether to add the jar to the current driver class loader. | ||
| """ | ||
| self._jsc.sc().addJar(path, addToCurrentClassLoader) | ||
|
|
||
| def setCheckpointDir(self, dirName): | ||
| """ | ||
| Set the directory under which RDDs are going to be checkpointed. The | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
local:/pathreferring to windows drive/path, or the actual textlocal:/should be there?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it refers the actual
local:/:spark/core/src/main/scala/org/apache/spark/SparkContext.scala
Line 1838 in b2463fa