-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11421][CORE][PYTHON][R] Added ability for addJar to augment the current classloader #19643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Most of codes are by @mariusvniekerk. I did some cleanup and addressed the review comments not being addressed. |
R/pkg/R/context.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks we have a problem here with handling URI, Windows path, although most of other cases should be fine though:
> normalizePath("file:/C:/a/b/c")
[1] "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"
Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
path[1]="file:/C:/a/b/c": The filename, directory name, or volume label syntax
is incorrectThis looks ending up with an weird path like "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c".
I am not sure how we should handle this as this pattern normalizedPath <- suppressWarnings(normalizePath(path)) looks quite common.
If it is fine, I would like to address this issue separately for other APIs, for example, spark.addFile right above ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided to pass URI here by passing the abs path for now in the test BTW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, normalizePath wouldn't handle url...
https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html
I think we should require absolute paths in their canonical form here and just pass through..
|
cc @shivaram, @felixcheung, @mariusvniekerk, @holdenk and @brkyvz who were in the PR. Would you guys mind taking a look please? |
|
Test build #83337 has finished for PR 19643 at commit
|
6e23dc9 to
b928ab8
Compare
|
retest this please |
|
Test build #83365 has finished for PR 19643 at commit
|
R/pkg/R/context.R
Outdated
| #' | ||
| #' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| #' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| #' If \code{addToCurrentClassLoader} is true, add the jar to the current driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is this right add the jar to the current driver.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is roughly right .. I wanted to avoid the words like "classloader" or "thread" .. Not sure what's the best wording to describe this within R / Python contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like underlying/backing java process ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you are back!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, probably that's better wording. Let me update it after a bit more waiting other review comments. @mariusvniekerk, I am okay with closing it if you happen to have time to proceed yours now, or I can proceed here. Either way works. Up to you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariusvniekerk are you okay with proceeding this here?
| #' Adds a JAR dependency for Spark tasks to be executed in the future. | ||
| #' | ||
| #' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| #' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is local:/path referring to windows drive/path, or the actual text local:/ should be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it refers the actual local:/:
| case "local" => "file:" + uri.getPath |
R/pkg/R/context.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, normalizePath wouldn't handle url...
https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html
I think we should require absolute paths in their canonical form here and just pass through..
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, one small comment on the Python side on top of Felix's existing comments.
| import importlib | ||
| importlib.invalidate_caches() | ||
|
|
||
| def addJar(self, path, addToCurrentClassLoader=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention that adding a jar to the current class loader is a developer API and may change.
| filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| If `addToCurrentClassLoader` is true, add the jar to the current threads' class loader | ||
| in the backing JVM. In general adding to the current threads' class loader will impact all | ||
| other application threads unless they have explicitly changed their class loader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holdenk and @felixcheung, here I just added the comments back. I thought it's a developer API and might be fine to describe some words related with JVM but .. please let me know if you guys feel we need to take out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we currently use .. note:: DeveloperApi to indicate it's a developer API (see ml/pipeline and friends for an example).
|
Test build #83596 has finished for PR 19643 at commit
|
|
Hi @jerryshao. Would you maybe have some time to take a look for this one please? |
|
|
||
| if (addToCurrentClassLoader) { | ||
| Utils.getContextOrSparkClassLoader match { | ||
| case cl: MutableURLClassLoader => cl.addURL(Utils.resolveURI(path).toURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure does it support remote jars on HTTPS or Hadoop FileSystems?In the executor side, we handle this explicitly by downloading jars to local and add to classpath, but here looks like we don't have such logic. I'm not sure how this URLClassLoader communicate with Hadoop or Https without certificates.
The addJar is just adding jars to fileserver, so that executor could fetch them from driver and add to classpath. It will not affect driver's classpath. If we support adding jars to current driver's classloader, then how do we leverage this newly added jars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jerryshao. Will check through this concern within this weekend and be back.
|
Let me leave this closed now and will reopen when I am ready to proceed. |
What changes were proposed in this pull request?
This PR takes over #15666
How was this patch tested?
Manually tested.
Closes #15666