-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11421] [Core][Python][R] Added ability for addJar to augment the current classloader #15666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -319,6 +319,32 @@ spark.addFile <- function(path, recursive = FALSE) { | |
| invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) | ||
| } | ||
|
|
||
|
|
||
| #' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. | ||
| #' | ||
| #' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| #' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| #' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In | ||
| #' general adding to the current threads' class loader will impact all other application threads | ||
| #' unless they have explicitly changed their class loader. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be reworded for R or Python? It should be the JVM. |
||
| #' | ||
| #' @rdname spark.addJar | ||
| #' @param path The path of the jar to be added | ||
| #' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. | ||
| #' @export | ||
| #' @examples | ||
| #'\dontrun{ | ||
| #' spark.addJar("/path/to/something.jar", TRUE) | ||
| #'} | ||
| #' @note spark.addJar since 2.2.0 | ||
| spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why don't we want to add it to the driver classpath by default?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mostly for backwards compatibility.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
| normalizedPath <- suppressWarnings(normalizePath(path)) | ||
| sc <- callJMethod(getSparkContext(), "sc") | ||
| invisible(callJMethod(sc, "addJar", normalizedPath, addToCurrentClassLoader)) | ||
| } | ||
|
|
||
|
|
||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. little nit: I guess we just need a single newline. |
||
| #' Get the root directory that contains files added through spark.addFile. | ||
| #' | ||
| #' @rdname spark.getSparkFilesRootDirectory | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { | |
| sparkR.session.stop() | ||
| }) | ||
|
|
||
| test_that("add jar should work and allow usage of the jar on the driver node", { | ||
| sparkR.sparkContext() | ||
|
|
||
| destDir <- file.path(tempdir(), "testjar") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd remove this |
||
| jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", | ||
| destDir, "sparkrTests", "DummyClassForAddJarTest") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, it looks the problem is here. > normalizePath("file:/C:/a/b/c")
[1] "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"
Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
path[1]="file:/C:/a/b/c": The filename, directory name, or volume label syntax
is incorrectThis looks ending up with an weird path like
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. little nit: jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
destDir, "sparkrTests", "DummyClassForAddJarTest") |
||
|
|
||
| spark.addJar(jarName, addToCurrentClassLoader = TRUE) | ||
| testClass <- newJObject("sparkrTests.DummyClassForAddJarTest") | ||
| expect_true(class(testClass) == "jobj") | ||
| }) | ||
|
|
||
| test_that("add and get file to be downloaded with Spark job on every node", { | ||
| sparkR.sparkContext(master = sparkRTestMaster) | ||
| # Test add file. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1801,9 +1801,23 @@ class SparkContext(config: SparkConf) extends Logging { | |
| /** | ||
| * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. | ||
| * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), | ||
| * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| */ | ||
| def addJar(path: String) { | ||
| def addJar(path: String): Unit = { | ||
| addJar(path, false) | ||
| } | ||
|
|
||
| /** | ||
| * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. | ||
| * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), | ||
| * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| * @param addToCurrentClassLoader if true will add the jar to the current threads' classloader. | ||
| * In general adding to the current threads' class loader will | ||
| * impact all other application threads unless they have explicitly | ||
| * changed their class loader. | ||
| */ | ||
| @DeveloperApi | ||
| def addJar(path: String, addToCurrentClassLoader: Boolean) { | ||
| def addJarFile(file: File): String = { | ||
| try { | ||
| if (!file.exists()) { | ||
|
|
@@ -1845,6 +1859,21 @@ class SparkContext(config: SparkConf) extends Logging { | |
| logInfo(s"Added JAR $path at $key with timestamp $timestamp") | ||
| postEnvironmentUpdate() | ||
| } | ||
|
|
||
| if (addToCurrentClassLoader) { | ||
| val currentCL = Utils.getContextOrSparkClassLoader | ||
| currentCL match { | ||
| case cl: MutableURLClassLoader => | ||
| val uri = if (path.contains("\\")) { | ||
| // For local paths with backslashes on Windows, URI throws an exception | ||
| new File(path).toURI | ||
| } else { | ||
| new URI(path) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we maybe just use |
||
| } | ||
| cl.addURL(uri.toURL) | ||
| case _ => logWarning(s"Unsupported cl $currentCL will not update jars thread cl") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say |
||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -168,6 +168,27 @@ private[spark] object TestUtils { | |
| createCompiledClass(className, destDir, sourceFile, classpathUrls) | ||
| } | ||
|
|
||
| /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ | ||
| def createDummyJar(destDir: String, packageName: String, className: String): String = { | ||
|
||
| val srcDir = new File(destDir, packageName) | ||
| srcDir.mkdirs() | ||
| val excSource = new JavaSourceFromString(new File(srcDir, className).toURI.getPath, | ||
| s"""package $packageName; | ||
| | | ||
| |public class $className implements java.io.Serializable { | ||
| | public static String helloWorld(String arg) { return "Hello " + arg; } | ||
| | public static int addStuff(int arg1, int arg2) { return arg1 + arg2; } | ||
| |} | ||
| """. | ||
| stripMargin) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (We could make this lined.)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could make this inlined. |
||
| val excFile = createCompiledClass(className, srcDir, excSource, Seq.empty) | ||
| val jarFile = new File(destDir, | ||
| s"$packageName-$className-%s.jar".format(System.currentTimeMillis())) | ||
| val jarURL = createJar(Seq(excFile), jarFile, directoryPrefix = Some(packageName)) | ||
| jarURL.toString | ||
| } | ||
|
|
||
|
|
||
| /** | ||
| * Run some code involving jobs submitted to the given context and assert that the jobs spilled. | ||
| */ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -863,6 +863,21 @@ def addPyFile(self, path): | |
| import importlib | ||
| importlib.invalidate_caches() | ||
|
|
||
| def addJar(self, path, addToCurrentClassLoader=False): | ||
| """ | ||
| Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. | ||
| The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| If addToCurrentClassLoader is true, add the jar to the current threads' classloader. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. little nit: |
||
| In general adding to the current threads' class loader will impact all other application | ||
| threads unless they have explicitly changed their class loader. | ||
|
|
||
| :param path: The path of the jar to be added | ||
| :param addToCurrentClassLoader: Whether to add the jar to the current driver classloader. | ||
| This defaults to False. | ||
| """ | ||
| self._jsc.sc().addJar(path, addToCurrentClassLoader) | ||
|
|
||
| def setCheckpointDir(self, dirName): | ||
| """ | ||
| Set the directory under which RDDs are going to be checkpointed. The | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,7 @@ | |
| import hashlib | ||
|
|
||
| from py4j.protocol import Py4JJavaError | ||
| from py4j.java_gateway import JavaClass | ||
| try: | ||
| import xmlrunner | ||
| except ImportError: | ||
|
|
@@ -435,6 +436,19 @@ def test_add_file_locally(self): | |
| with open(download_path) as test_file: | ||
| self.assertEqual("Hello World!\n", test_file.readline()) | ||
|
|
||
| def test_add_jar(self): | ||
| jvm = self.sc._jvm | ||
| # We shouldn't be able to load anything from the package before it is added | ||
| self.assertFalse(isinstance(jvm.pysparktests.DummyClass, JavaClass)) | ||
| # Generate and compile the test jar | ||
| destDir = os.path.join(SPARK_HOME, "python/test_support/jar") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd remove this directory too.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead you want to use a temp directory? |
||
| jarName = jvm.org.apache.spark.TestUtils.createDummyJar( | ||
| destDir, "pysparktests", "DummyClass") | ||
| # Load the new jar | ||
| self.sc.addJar(jarName, True) | ||
| # Try and load the class | ||
| self.assertTrue(isinstance(jvm.pysparktests.DummyClass, JavaClass)) | ||
|
|
||
| def test_add_file_recursively_locally(self): | ||
| path = os.path.join(SPARK_HOME, "python/test_support/hello") | ||
| self.sc.addFile(path, True) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really expose SparkContext in R actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case do we want to bother having this method for R?