Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -231,17 +231,22 @@ setCheckpointDir <- function(sc, dirName) {
#' filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs,
#' use spark.getSparkFiles(fileName) to find its download location.
#'
#' A directory can be given if the recursive option is set to true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd merge this into @param path below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or omit this since it's described in @param recursive?

#' Currently directories are only supported for Hadoop-supported filesystems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a bit confusing - do we have links to what this mean?

Copy link
Contributor Author

@yanboliang yanboliang Sep 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation here is consistent with Scala/Python, and Hadoop-supported filesystem is the file system which Hadoop supported. I think it's easy to understand for users. Or should we add a link to Hadoop-supported filesystems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. Recently someone was asking about why SparkR was using Hadoop file system classes to read NFS, local, etc. in the user list - it might not be obvious to users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, added links to Hadoop-supported filesystem. Thanks!

#' Refer Hadoop-supported filesystems at \url{https://wiki.apache.org/hadoop/HCFS}.
#'
#' @rdname spark.addFile
#' @param path The path of the file to be added
#' @param recursive Whether to add files recursively from the path. Default is FALSE.
#' @export
#' @examples
#'\dontrun{
#' spark.addFile("~/myfile")
#'}
#' @note spark.addFile since 2.1.0
spark.addFile <- function(path) {
spark.addFile <- function(path, recursive = FALSE) {
sc <- getSparkContext()
invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path))))
invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
}

#' Get the root directory that contains files added through spark.addFile.
Expand Down
22 changes: 22 additions & 0 deletions R/pkg/inst/tests/testthat/test_context.R
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ test_that("spark.lapply should perform simple transforms", {

test_that("add and get file to be downloaded with Spark job on every node", {
sparkR.sparkContext()
# Test add file.
path <- tempfile(pattern = "hello", fileext = ".txt")
filename <- basename(path)
words <- "Hello World!"
Expand All @@ -177,5 +178,26 @@ test_that("add and get file to be downloaded with Spark job on every node", {
download_path <- spark.getSparkFiles(filename)
expect_equal(readLines(download_path), words)
unlink(path)

# Test add directory recursively.
path <- paste0(tempdir(), "/", "recursive_dir")
dir.create(path)
dir_name <- basename(path)
path1 <- paste0(path, "/", "hello.txt")
file.create(path1)
sub_path <- paste0(path, "/", "sub_hello")
dir.create(sub_path)
path2 <- paste0(sub_path, "/", "sub_hello.txt")
file.create(path2)
words <- "Hello World!"
sub_words <- "Sub Hello World!"
writeLines(words, path1)
writeLines(sub_words, path2)
spark.addFile(path, recursive = TRUE)
download_path1 <- spark.getSparkFiles(paste0(dir_name, "/", "hello.txt"))
expect_equal(readLines(download_path1), words)
download_path2 <- spark.getSparkFiles(paste0(dir_name, "/", "sub_hello/sub_hello.txt"))
expect_equal(readLines(download_path2), sub_words)
unlink(path, recursive = TRUE)
sparkR.session.stop()
})