-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10643] [Core] Make spark-submit download remote files to local in client mode #18078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10643] [Core] Make spark-submit download remote files to local in client mode #18078
Conversation
…l/standalone client mode
|
Please reference the existing bug (SPARK-10643) instead. |
|
Test build #77268 has finished for PR 18078 at commit
|
|
Test build #77271 has finished for PR 18078 at commit
|
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! The PR looks good overall, it would be great if more negative test cases could be added such as invalid file paths.
| RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose) | ||
| } | ||
|
|
||
| // In client mode, download remotes files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "remotes" -> "remote"
| test("resolves command line argument paths correctly") { | ||
| val jars = "/jar1,/jar2" // --jars | ||
| val files = "hdfs:/file1,file2" // --files | ||
| val files = "local:/file1,file2" // --files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on why we are changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it not try to download file from hdfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is kinda difficult to test download file from hdfs now, but we should cover this scene in the future.
|
Could you also add "[Core]" tag in the title? @loneknightpy |
|
Test build #77388 has started for PR 18078 at commit |
|
LGTM |
|
Test build #77389 has started for PR 18078 at commit |
| } | ||
|
|
||
| // In client mode, download remote files. | ||
| if (deployMode == CLIENT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I may not have enough background knowledge, why we only do this for client mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it can handle remote files in Yarn/Mesos cluster mode. I haven't tested it, because we are using client mode.
| /** | ||
| * Download a list of remote files to temp local files. If the file is local, the original file | ||
| * will be returned. | ||
| * @param fileList A comma separated file list, it cannot be null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no need to add the comment it cannot be null. Just add an assert at the beginning of the function to ensure it.
| } | ||
|
|
||
| /** | ||
| * Download remote file to a temporary local file. If the file is local, the original file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
Downloads a file from the remote to a local temporary directory. If the input path points to a local path, returns it with no operation
| .fromPath(tmpFile.getAbsolutePath) | ||
| .scheme("file") | ||
| .build() | ||
| .toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val localPath = new Path(tmpFile.getAbsolutePath)
fs.copyToLocalFile(new Path(uri), localPath)
val localFS: FileSystem = localPath.getFileSystem(hadoopConf)
localFS.makeQualified(localPath).toStringDoes this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like our code base never calls UriBuilder directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If UriBuilder is a concern, we can just use "file:${tmpFile.getAbsolutePath}"
|
Test build #77412 has started for PR 18078 at commit |
|
Test build #77413 has started for PR 18078 at commit |
| printStream.println(s"Downloading ${uri.toString} to ${tmpFile.getAbsolutePath}.") | ||
| // scalastyle:on println | ||
| fs.copyToLocalFile(new Path(uri), new Path(tmpFile.getAbsolutePath)) | ||
| s"file:${tmpFile.getAbsolutePath}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or calling Utils.resolveURI(tmpFile.getAbsolutePath).toString?
It sounds Utils.resolveURI is commonly used for this purpose?
|
retest this please |
|
Test build #77418 has finished for PR 18078 at commit
|
|
LGTM pending Jenkins |
|
Test build #77427 has finished for PR 18078 at commit
|
|
retest this please |
|
Test build #77434 has finished for PR 18078 at commit
|
|
Test build #77435 has finished for PR 18078 at commit
|
…in client mode ## What changes were proposed in this pull request? This PR makes spark-submit script download remote files to local file system for local/standalone client mode. ## How was this patch tested? - Unit tests - Manual tests by adding s3a jar and testing against file on s3. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Yu Peng <loneknightpy@gmail.com> Closes #18078 from loneknightpy/download-jar-in-spark-submit. (cherry picked from commit 4af3781) Signed-off-by: Xiao Li <gatorsmile@gmail.com>
|
Thanks! Merging to master/2.2 |
What changes were proposed in this pull request?
This PR makes spark-submit script download remote files to local file system for local/standalone client mode.
How was this patch tested?
Please review http://spark.apache.org/contributing.html before opening a pull request.