[SPARK-11421][CORE][PYTHON][R] Added ability for addJar to augment the current classloader #19643

HyukjinKwon · 2017-11-02T12:36:49Z

What changes were proposed in this pull request?

This PR takes over #15666

Adds a flag to sc.addJar to add the jar to the current classloader

How was this patch tested?

Manually tested.

Unit tests, manual tests

This is a continuation of the pull request in #9313 and is mostly a rebase of that moved to master > with SparkR additions.

Closes #15666

HyukjinKwon · 2017-11-02T12:38:06Z

Most of codes are by @mariusvniekerk. I did some cleanup and addressed the review comments not being addressed.

HyukjinKwon · 2017-11-02T12:43:14Z

R/pkg/R/context.R

Looks we have a problem here with handling URI, Windows path, although most of other cases should be fine though:

> normalizePath("file:/C:/a/b/c") [1] "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c" Warning message: In normalizePath(path.expand(path), winslash, mustWork) : path[1]="file:/C:/a/b/c": The filename, directory name, or volume label syntax is incorrect

This looks ending up with an weird path like "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c".

I am not sure how we should handle this as this pattern normalizedPath <- suppressWarnings(normalizePath(path)) looks quite common.

If it is fine, I would like to address this issue separately for other APIs, for example, spark.addFile right above ..

I avoided to pass URI here by passing the abs path for now in the test BTW.

yea, normalizePath wouldn't handle url...
https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html

I think we should require absolute paths in their canonical form here and just pass through..

HyukjinKwon · 2017-11-02T12:44:29Z

cc @shivaram, @felixcheung, @mariusvniekerk, @holdenk and @brkyvz who were in the PR. Would you guys mind taking a look please?

SparkQA · 2017-11-02T13:34:22Z

Test build #83337 has finished for PR 19643 at commit 49b9d48.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-11-03T00:02:44Z

retest this please

SparkQA · 2017-11-03T03:43:48Z

Test build #83365 has finished for PR 19643 at commit b928ab8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-11-03T04:59:26Z

R/pkg/R/context.R

+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.


hmm, is this right add the jar to the current driver.?

I think it is roughly right .. I wanted to avoid the words like "classloader" or "thread" .. Not sure what's the best wording to describe this within R / Python contexts.

maybe something like underlying/backing java process ?

Oh, you are back!

Yup, probably that's better wording. Let me update it after a bit more waiting other review comments. @mariusvniekerk, I am okay with closing it if you happen to have time to proceed yours now, or I can proceed here. Either way works. Up to you :)

@mariusvniekerk are you okay with proceeding this here?

felixcheung · 2017-11-03T05:00:12Z

R/pkg/R/context.R

+#' Adds a JAR dependency for Spark tasks to be executed in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.


is local:/path referring to windows drive/path, or the actual text local:/ should be there?

I think it refers the actual local:/:

spark/core/src/main/scala/org/apache/spark/SparkContext.scala

Line 1838 in b2463fa

case "local" => "file:" + uri.getPath

felixcheung · 2017-11-03T05:02:41Z

R/pkg/R/context.R

yea, normalizePath wouldn't handle url...
https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html

I think we should require absolute paths in their canonical form here and just pass through..

holdenk

Thanks for working on this, one small comment on the Python side on top of Felix's existing comments.

holdenk · 2017-11-07T19:27:31Z

python/pyspark/context.py

            import importlib
            importlib.invalidate_caches()

+    def addJar(self, path, addToCurrentClassLoader=False):


We should mention that adding a jar to the current class loader is a developer API and may change.

HyukjinKwon · 2017-11-08T11:23:07Z

python/pyspark/context.py

+        filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
+        If `addToCurrentClassLoader` is true, add the jar to the current threads' class loader
+        in the backing JVM. In general adding to the current threads' class loader will impact all
+        other application threads unless they have explicitly changed their class loader.


@holdenk and @felixcheung, here I just added the comments back. I thought it's a developer API and might be fine to describe some words related with JVM but .. please let me know if you guys feel we need to take out.

So we currently use .. note:: DeveloperApi to indicate it's a developer API (see ml/pipeline and friends for an example).

SparkQA · 2017-11-08T14:53:10Z

Test build #83596 has finished for PR 19643 at commit ab52809.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-11-16T00:54:35Z

Hi @jerryshao. Would you maybe have some time to take a look for this one please?

jerryshao · 2017-11-16T02:29:49Z

core/src/main/scala/org/apache/spark/SparkContext.scala

+
+        if (addToCurrentClassLoader) {
+          Utils.getContextOrSparkClassLoader match {
+            case cl: MutableURLClassLoader => cl.addURL(Utils.resolveURI(path).toURL)


I'm not sure does it support remote jars on HTTPS or Hadoop FileSystems？In the executor side, we handle this explicitly by downloading jars to local and add to classpath, but here looks like we don't have such logic. I'm not sure how this URLClassLoader communicate with Hadoop or Https without certificates.

The addJar is just adding jars to fileserver, so that executor could fetch them from driver and add to classpath. It will not affect driver's classpath. If we support adding jars to current driver's classloader, then how do we leverage this newly added jars?

Thanks @jerryshao. Will check through this concern within this weekend and be back.

HyukjinKwon · 2017-12-25T04:03:07Z

Let me leave this closed now and will reopen when I am ready to proceed.

mariusvniekerk added 3 commits November 2, 2017 17:10

SPARK-11421 Squashed addjar pr

ab4b6b1

Addressed some review comments

3c32174

Addressed some review comments

a62367f

HyukjinKwon commented Nov 2, 2017

View reviewed changes

Address the rest of comments

b928ab8

HyukjinKwon force-pushed the SPARK-11421 branch from 6e23dc9 to b928ab8 Compare November 2, 2017 22:07

felixcheung reviewed Nov 3, 2017

View reviewed changes

holdenk reviewed Nov 7, 2017

View reviewed changes

HyukjinKwon added 2 commits November 8, 2017 20:17

Fix comments

f839240

classloader -> class loader

ab52809

HyukjinKwon commented Nov 8, 2017

View reviewed changes

jerryshao reviewed Nov 16, 2017

View reviewed changes

HyukjinKwon closed this Dec 25, 2017

HyukjinKwon deleted the SPARK-11421 branch January 2, 2018 03:41

mariusvniekerk mentioned this pull request Apr 6, 2018

[SPARK-11421] [Core][Python][R] Added ability for addJar to augment the current classloader #15666

Closed

HyukjinKwon restored the SPARK-11421 branch April 6, 2018 08:17

HyukjinKwon deleted the SPARK-11421 branch October 16, 2018 12:45

HyukjinKwon mentioned this pull request Jan 7, 2023

[SPARK-41933][CONNECT] Provide local mode that automatically starts the server #39441

Closed

[SPARK-11421][CORE][PYTHON][R] Added ability for addJar to augment the current classloader #19643

[SPARK-11421][CORE][PYTHON][R] Added ability for addJar to augment the current classloader #19643

Uh oh!

Conversation

HyukjinKwon commented Nov 2, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Nov 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Nov 2, 2017

Uh oh!

SparkQA commented Nov 2, 2017

Uh oh!

HyukjinKwon commented Nov 3, 2017

Uh oh!

SparkQA commented Nov 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 8, 2017

Uh oh!

HyukjinKwon commented Nov 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Dec 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HyukjinKwon commented Nov 2, 2017 •

edited

Loading

HyukjinKwon Nov 3, 2017 •

edited

Loading