Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
ecf5605
[SPARK-20354][CORE][REST-API] When I request access to the 'http: //i…
Apr 18, 2017
7dbc0a9
[SPARK-20360][PYTHON] reprs for interpreters
rgbkrk Apr 18, 2017
6a25d39
[SPARK-20377][SS] Fix JavaStructuredSessionization example
tdas Apr 18, 2017
a33d448
[SPARK-20254][SQL] Remove unnecessary data conversion for Dataset wit…
kiszk Apr 19, 2017
ef6923f
[SPARK-20208][R][DOCS] Document R fpGrowth support
zero323 Apr 19, 2017
274a3e2
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin …
koertkuipers Apr 19, 2017
a87e21d
[SPARK-20356][SQL] Pruned InMemoryTableScanExec should have correct o…
viirya Apr 19, 2017
8baa970
[SPARK-20343][BUILD] Avoid Unidoc build only if Hadoop 2.6 is explici…
HyukjinKwon Apr 19, 2017
80a60da
[SPARK-20036][DOC] Note incompatible dependencies on org.apache.kafka…
koeninger Apr 19, 2017
d649787
[SPARK-20397][SPARKR][SS] Fix flaky test: test_streaming.R.Terminated…
zsxwing Apr 19, 2017
371af96
[SPARK-20350] Add optimization rules to apply Complementation Laws.
ptkool Apr 20, 2017
af9f18c
[MINOR][SS] Fix a missing space in UnsupportedOperationChecker error …
zsxwing Apr 20, 2017
e6bbdb0
[SPARK-20398][SQL] range() operator should include cancellation reaso…
ericl Apr 20, 2017
8d658b9
Fixed typos in docs
Apr 20, 2017
d01122d
[SPARK-20156][SQL][FOLLOW-UP] Java String toLowerCase "Turkish locale…
gatorsmile Apr 20, 2017
9fd25fb
[SPARK-20405][SQL] Dataset.withNewExecutionId should be private
rxin Apr 20, 2017
9904526
[SPARK-20409][SQL] fail early if aggregate function in GROUP BY
cloud-fan Apr 20, 2017
32c5a10
[SPARK-20407][TESTS] ParquetQuerySuite 'Enabling/disabling ignoreCorr…
bogdanrdc Apr 20, 2017
e929cd7
[SPARK-20358][CORE] Executors failing stage on interrupted exception …
ericl Apr 20, 2017
01f6262
[SPARK-20410][SQL] Make sparkConf a def in SharedSQLContext
hvanhovell Apr 20, 2017
7e9eba0
[SPARK-20172][CORE] Add file permission check when listing files in F…
jerryshao Apr 20, 2017
d17dea8
[SPARK-20367] Properly unescape column names of partitioning columns …
juliuszsompolski Apr 21, 2017
5ce7680
[SPARK-20329][SQL] Make timezone aware expression without timezone un…
hvanhovell Apr 21, 2017
6cd2f16
[SPARK-20281][SQL] Print the identical Range parameters of SparkConte…
maropu Apr 21, 2017
cddb4b7
[SPARK-20420][SQL] Add events to the external catalog
hvanhovell Apr 21, 2017
eb4d097
Small rewording about history server use case
dud225 Apr 21, 2017
aaeca8b
[SPARK-20412] Throw ParseException from visitNonOptionalPartitionSpec…
juliuszsompolski Apr 21, 2017
adaa3f7
[SPARK-20341][SQL] Support BigInt's value that does not fit in long v…
kiszk Apr 21, 2017
ff1f989
[SPARK-20423][ML] fix MLOR coeffs centering when reg == 0
WeichenXu123 Apr 21, 2017
6c2489c
[SPARK-20401][DOC] In the spark official configuration document, the …
Apr 21, 2017
d68e0a3
[SPARK-20386][SPARK CORE] modify the log info if the block exists on …
eatoncys Apr 22, 2017
807c718
[SPARK-20430][SQL] Initialise RangeExec parameters in a driver side
maropu Apr 22, 2017
cad33a7
[SPARK-20385][WEB-UI] Submitted Time' field, the date format needs to…
Apr 23, 2017
2bef01f
[SPARK-20439][SQL] Fix Catalog API listTables and getTable when faile…
gatorsmile Apr 24, 2017
cf16c32
[SPARK-18901][ML] Require in LR LogisticAggregator is redundant
wangmiao1981 Apr 24, 2017
30149d5
[SPARK-20239][CORE] Improve HistoryServer's ACL mechanism
jerryshao Apr 25, 2017
fb59a19
[SPARK-20451] Filter out nested mapType datatypes from sort order in …
sameeragarwal Apr 25, 2017
c18de9c
[SPARK-20455][DOCS] Fix Broken Docker IT Docs
original-brownbear Apr 25, 2017
b62ebd9
[SPARK-20404][CORE] Using Option(name) instead of Some(name)
szhem Apr 25, 2017
e2591c6
[SPARK-18901][FOLLOWUP][ML] Require in LR LogisticAggregator is redun…
wangmiao1981 Apr 25, 2017
55834a8
[SPARK-20449][ML] Upgrade breeze version to 0.13.1
yanboliang Apr 25, 2017
f971ce5
[SPARK-5484][GRAPHX] Periodically do checkpoint in Pregel
Apr 25, 2017
f0de600
[SPARK-18127] Add hooks and extension points to Spark
sameeragarwal Apr 26, 2017
c8803c0
[SPARK-16548][SQL] Inconsistent error handling in JSON parsing SQL fu…
Apr 26, 2017
a2f5ced
[SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools
Apr 26, 2017
6129522
[SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…
tgravescs Apr 26, 2017
34dec68
[MINOR][ML] Fix some PySpark & SparkR flaky tests
yanboliang Apr 26, 2017
b65858b
[SPARK-20391][CORE] Rename memory related fields in ExecutorSummay
jerryshao Apr 26, 2017
6709bcf
[SPARK-20473] Enabling missing types in ColumnVector.Array
michal-databricks Apr 26, 2017
e278876
[SPARK-20474] Fixing OnHeapColumnVector reallocation
michal-databricks Apr 26, 2017
b48bb3a
[SPARK-12868][SQL] Allow adding jars from hdfs
weiqingy Apr 26, 2017
d6efda5
[SPARK-20435][CORE] More thorough redaction of sensitive information
markgrover Apr 27, 2017
8ccb4a5
Preparing Spark release v2.2.0-rc1
pwendell Apr 27, 2017
75544c0
Preparing development version 2.2.0-SNAPSHOT
pwendell Apr 27, 2017
c86c078
[SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks
dgshep Apr 27, 2017
87d27e5
[SPARK-20421][CORE] Mark internal listeners as deprecated.
Apr 27, 2017
090b337
[SPARK-20482][SQL] Resolving Casts is too strict on having time zone set
rednaxelafx Apr 27, 2017
92b61f0
[SPARK-20487][SQL] `HiveTableScan` node is quite verbose in explained…
tejasapatil Apr 27, 2017
c69d862
[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shu…
Apr 27, 2017
c29c6de
[SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Meso…
dgshep Apr 27, 2017
4512e2a
[SPARK-20047][ML] Constrained Logistic Regression
yanboliang Apr 27, 2017
753e129
[SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fi…
zsxwing Apr 27, 2017
3d53d82
[SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationExcept…
zsxwing Apr 27, 2017
e02b6eb
[SPARK-12837][CORE] Do not send the name of internal accumulator to e…
cloud-fan Apr 28, 2017
f60ed0c
[SPARKR][DOC] Document LinearSVC in R programming guide
wangmiao1981 Apr 28, 2017
26a9e29
[SPARK-20476][SQL] Block users to create a table that use commas in t…
gatorsmile Apr 28, 2017
af3a141
[SPARK-14471][SQL] Aliases in SELECT could be used in GROUP BY
maropu Apr 28, 2017
ea5b114
[SPARK-20465][CORE] Throws a proper exception when any temp directory…
HyukjinKwon Apr 28, 2017
ec712d7
[SPARK-20496][SS] Bug in KafkaWriter Looks at Unanalyzed Plans
Apr 28, 2017
f66aabd
[SPARK-20514][CORE] Upgrade Jetty to 9.3.11.v20160721
markgrover Apr 28, 2017
5547002
[SPARK-20471] Remove AggregateBenchmark testsuite warning: Two level …
heary-cao Apr 28, 2017
1405862
[SPARK-19525][CORE] Add RDD checkpoint compression support
Apr 28, 2017
ca6c59e
[SPARK-20487][SQL] Display `serde` for `HiveTableScan` node in explai…
tejasapatil Apr 29, 2017
4a86d8d
[SPARK-20477][SPARKR][DOC] Document R bisecting k-means in R programm…
wangmiao1981 Apr 29, 2017
9789d5c
[SPARK-19791][ML] Add doc and example for fpgrowth
YY-OnCall Apr 29, 2017
c5f5593
[SPARK-20521][DOC][CORE] The default of 'spark.worker.cleanup.appData…
Apr 30, 2017
c5beabc
[SPARK-20492][SQL] Do not print empty parentheses for invalid primiti…
HyukjinKwon Apr 30, 2017
994d9da
[MINOR][DOCS][PYTHON] Adding missing boolean type for replacement val…
May 1, 2017
c890e93
[SPARK-20541][SPARKR][SS] support awaitTermination without timeout
felixcheung May 1, 2017
813abd2
[SPARK-20534][SQL] Make outer generate exec return empty rows
hvanhovell May 1, 2017
38edb92
[SPARK-20517][UI] Fix broken history UI download link
jerryshao May 1, 2017
6f0d296
[SPARK-20464][SS] Add a job group and description for streaming queri…
kunalkhamar May 1, 2017
cfa6bcb
[SPARK-20540][CORE] Fix unstable executor requests.
rdblue May 1, 2017
5a0a8b0
[SPARK-20459][SQL] JdbcUtils throws IllegalStateException: Cause alre…
srowen May 2, 2017
b7c1c2f
[SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0
felixcheung May 2, 2017
b146481
[SPARK-20537][CORE] Fixing OffHeapColumnVector reallocation
kiszk May 2, 2017
ef5e2a0
[SPARK-20549] java.io.CharConversionException: Invalid UTF-32' in Jso…
brkyvz May 2, 2017
01f3be7
[SPARK-20300][ML][PYSPARK] Python API for ALSModel.recommendForAllUse…
May 2, 2017
4f4083b
[SPARK-19235][SQL][TEST][FOLLOW-UP] Enable Test Cases in DDLSuite wit…
gatorsmile May 2, 2017
871b073
[SPARK-20421][CORE] Add a missing deprecation tag.
May 2, 2017
c199764
[SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkCo…
cloud-fan May 3, 2017
c80242a
[SPARK-20567] Lazily bind in GenerateExec
marmbrus May 3, 2017
4f647ab
[SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers for SVD and P…
MechCoder May 3, 2017
b5947f5
[SPARK-20523][BUILD] Clean up build warnings for 2.2.0 release
srowen May 3, 2017
b1a732f
[SPARK-20441][SPARK-20432][SS] Within the same streaming query, one S…
lw-lin May 3, 2017
f0e80aa
[SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame
rxin May 3, 2017
36d8079
[SPARK-19965][SS] DataFrame batch reader may fail to infer partitions…
lw-lin May 3, 2017
2629e7c
[MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated …
HyukjinKwon May 3, 2017
1d4017b
Preparing Spark release v2.2.0-rc2
pwendell May 3, 2017
a3a5fcf
Preparing development version 2.2.1-SNAPSHOT
pwendell May 3, 2017
d8bd213
[SPARK-20584][PYSPARK][SQL] Python generic hint support
zero323 May 4, 2017
5fe9313
[SPARK-20544][SPARKR] skip tests when running on CRAN
felixcheung May 4, 2017
6c5c594
[SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streami…
felixcheung May 4, 2017
3f5c548
[SPARK-20585][SPARKR] R generic hint support
zero323 May 4, 2017
b672779
[SPARK-20571][SPARKR][SS] Flaky Structured Streaming tests
felixcheung May 4, 2017
425ed26
[SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up
yanboliang May 4, 2017
c875628
[SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column
May 5, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -297,3 +297,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(MIT License) RowsGroup (http://datatables.net/license/mit)
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)
(MIT License) machinist (https://github.com/typelevel/machinist)
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 2.2.0
Version: 2.2.1
Title: R Frontend for Apache Spark
Description: The SparkR package provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
Expand Down
1 change: 1 addition & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ exportMethods("arrange",
"group_by",
"groupBy",
"head",
"hint",
"insertInto",
"intersect",
"isLocal",
Expand Down
30 changes: 30 additions & 0 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -3642,3 +3642,33 @@ setMethod("checkpoint",
df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
dataFrame(df)
})

#' hint
#'
#' Specifies execution plan hint and return a new SparkDataFrame.
#'
#' @param x a SparkDataFrame.
#' @param name a name of the hint.
#' @param ... optional parameters for the hint.
#' @return A SparkDataFrame.
#' @family SparkDataFrame functions
#' @aliases hint,SparkDataFrame,character-method
#' @rdname hint
#' @name hint
#' @export
#' @examples
#' \dontrun{
#' df <- createDataFrame(mtcars)
#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
#'
#' head(join(df, hint(avg_mpg, "broadcast"), df$cyl == avg_mpg$cyl))
#' }
#' @note hint since 2.2.0
setMethod("hint",
signature(x = "SparkDataFrame", name = "character"),
function(x, name, ...) {
parameters <- list(...)
stopifnot(all(sapply(parameters, is.character)))
jdf <- callJMethod(x@sdf, "hint", name, parameters)
dataFrame(jdf)
})
6 changes: 5 additions & 1 deletion R/pkg/R/generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,10 @@ setGeneric("group_by", function(x, ...) { standardGeneric("group_by") })
#' @export
setGeneric("groupBy", function(x, ...) { standardGeneric("groupBy") })

#' @rdname hint
#' @export
setGeneric("hint", function(x, name, ...) { standardGeneric("hint") })

#' @rdname insertInto
#' @export
setGeneric("insertInto", function(x, tableName, ...) { standardGeneric("insertInto") })
Expand Down Expand Up @@ -1469,7 +1473,7 @@ setGeneric("write.ml", function(object, path, ...) { standardGeneric("write.ml")

#' @rdname awaitTermination
#' @export
setGeneric("awaitTermination", function(x, timeout) { standardGeneric("awaitTermination") })
setGeneric("awaitTermination", function(x, timeout = NULL) { standardGeneric("awaitTermination") })

#' @rdname isActive
#' @export
Expand Down
14 changes: 10 additions & 4 deletions R/pkg/R/streaming.R
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,10 @@ setMethod("isActive",
#' immediately.
#'
#' @param x a StreamingQuery.
#' @param timeout time to wait in milliseconds
#' @return TRUE if query has terminated within the timeout period.
#' @param timeout time to wait in milliseconds, if omitted, wait indefinitely until \code{stopQuery}
#' is called or an error has occured.
#' @return TRUE if query has terminated within the timeout period; nothing if timeout is not
#' specified.
#' @rdname awaitTermination
#' @name awaitTermination
#' @aliases awaitTermination,StreamingQuery-method
Expand All @@ -182,8 +184,12 @@ setMethod("isActive",
#' @note experimental
setMethod("awaitTermination",
signature(x = "StreamingQuery"),
function(x, timeout) {
handledCallJMethod(x@ssq, "awaitTermination", as.integer(timeout))
function(x, timeout = NULL) {
if (is.null(timeout)) {
invisible(handledCallJMethod(x@ssq, "awaitTermination"))
} else {
handledCallJMethod(x@ssq, "awaitTermination", as.integer(timeout))
}
})

#' stopQuery
Expand Down
6 changes: 6 additions & 0 deletions R/pkg/inst/tests/testthat/test_Serde.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ context("SerDe functionality")
sparkSession <- sparkR.session(enableHiveSupport = FALSE)

test_that("SerDe of primitive types", {
skip_on_cran()

x <- callJStatic("SparkRHandler", "echo", 1L)
expect_equal(x, 1L)
expect_equal(class(x), "integer")
Expand All @@ -38,6 +40,8 @@ test_that("SerDe of primitive types", {
})

test_that("SerDe of list of primitive types", {
skip_on_cran()

x <- list(1L, 2L, 3L)
y <- callJStatic("SparkRHandler", "echo", x)
expect_equal(x, y)
Expand Down Expand Up @@ -65,6 +69,8 @@ test_that("SerDe of list of primitive types", {
})

test_that("SerDe of list of lists", {
skip_on_cran()

x <- list(list(1L, 2L, 3L), list(1, 2, 3),
list(TRUE, FALSE), list("a", "b", "c"))
y <- callJStatic("SparkRHandler", "echo", x)
Expand Down
2 changes: 2 additions & 0 deletions R/pkg/inst/tests/testthat/test_Windows.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
context("Windows-specific tests")

test_that("sparkJars tag in SparkContext", {
skip_on_cran()

if (.Platform$OS.type != "windows") {
skip("This test is only for Windows, skipped")
}
Expand Down
8 changes: 8 additions & 0 deletions R/pkg/inst/tests/testthat/test_binaryFile.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ sc <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", "getJavaSparkContext",
mockFile <- c("Spark is pretty.", "Spark is awesome.")

test_that("saveAsObjectFile()/objectFile() following textFile() works", {
skip_on_cran()

fileName1 <- tempfile(pattern = "spark-test", fileext = ".tmp")
fileName2 <- tempfile(pattern = "spark-test", fileext = ".tmp")
writeLines(mockFile, fileName1)
Expand All @@ -38,6 +40,8 @@ test_that("saveAsObjectFile()/objectFile() following textFile() works", {
})

test_that("saveAsObjectFile()/objectFile() works on a parallelized list", {
skip_on_cran()

fileName <- tempfile(pattern = "spark-test", fileext = ".tmp")

l <- list(1, 2, 3)
Expand All @@ -50,6 +54,8 @@ test_that("saveAsObjectFile()/objectFile() works on a parallelized list", {
})

test_that("saveAsObjectFile()/objectFile() following RDD transformations works", {
skip_on_cran()

fileName1 <- tempfile(pattern = "spark-test", fileext = ".tmp")
fileName2 <- tempfile(pattern = "spark-test", fileext = ".tmp")
writeLines(mockFile, fileName1)
Expand All @@ -74,6 +80,8 @@ test_that("saveAsObjectFile()/objectFile() following RDD transformations works",
})

test_that("saveAsObjectFile()/objectFile() works with multiple paths", {
skip_on_cran()

fileName1 <- tempfile(pattern = "spark-test", fileext = ".tmp")
fileName2 <- tempfile(pattern = "spark-test", fileext = ".tmp")

Expand Down
6 changes: 6 additions & 0 deletions R/pkg/inst/tests/testthat/test_binary_function.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ rdd <- parallelize(sc, nums, 2L)
mockFile <- c("Spark is pretty.", "Spark is awesome.")

test_that("union on two RDDs", {
skip_on_cran()

actual <- collectRDD(unionRDD(rdd, rdd))
expect_equal(actual, as.list(rep(nums, 2)))

Expand All @@ -51,6 +53,8 @@ test_that("union on two RDDs", {
})

test_that("cogroup on two RDDs", {
skip_on_cran()

rdd1 <- parallelize(sc, list(list(1, 1), list(2, 4)))
rdd2 <- parallelize(sc, list(list(1, 2), list(1, 3)))
cogroup.rdd <- cogroup(rdd1, rdd2, numPartitions = 2L)
Expand All @@ -69,6 +73,8 @@ test_that("cogroup on two RDDs", {
})

test_that("zipPartitions() on RDDs", {
skip_on_cran()

rdd1 <- parallelize(sc, 1:2, 2L) # 1, 2
rdd2 <- parallelize(sc, 1:4, 2L) # 1:2, 3:4
rdd3 <- parallelize(sc, 1:6, 2L) # 1:3, 4:6
Expand Down
4 changes: 4 additions & 0 deletions R/pkg/inst/tests/testthat/test_broadcast.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ nums <- 1:2
rrdd <- parallelize(sc, nums, 2L)

test_that("using broadcast variable", {
skip_on_cran()

randomMat <- matrix(nrow = 10, ncol = 10, data = rnorm(100))
randomMatBr <- broadcast(sc, randomMat)

Expand All @@ -38,6 +40,8 @@ test_that("using broadcast variable", {
})

test_that("without using broadcast variable", {
skip_on_cran()

randomMat <- matrix(nrow = 10, ncol = 10, data = rnorm(100))

useBroadcast <- function(x) {
Expand Down
8 changes: 8 additions & 0 deletions R/pkg/inst/tests/testthat/test_client.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
context("functions in client.R")

test_that("adding spark-testing-base as a package works", {
skip_on_cran()

args <- generateSparkSubmitArgs("", "", "", "",
"holdenk:spark-testing-base:1.3.0_0.0.5")
expect_equal(gsub("[[:space:]]", "", args),
Expand All @@ -26,16 +28,22 @@ test_that("adding spark-testing-base as a package works", {
})

test_that("no package specified doesn't add packages flag", {
skip_on_cran()

args <- generateSparkSubmitArgs("", "", "", "", "")
expect_equal(gsub("[[:space:]]", "", args),
"")
})

test_that("multiple packages don't produce a warning", {
skip_on_cran()

expect_warning(generateSparkSubmitArgs("", "", "", "", c("A", "B")), NA)
})

test_that("sparkJars sparkPackages as character vectors", {
skip_on_cran()

args <- generateSparkSubmitArgs("", "", c("one.jar", "two.jar", "three.jar"), "",
c("com.databricks:spark-avro_2.10:2.0.1"))
expect_match(args, "--jars one.jar,two.jar,three.jar")
Expand Down
16 changes: 16 additions & 0 deletions R/pkg/inst/tests/testthat/test_context.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
context("test functions in sparkR.R")

test_that("Check masked functions", {
skip_on_cran()

# Check that we are not masking any new function from base, stats, testthat unexpectedly
# NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as masked functions make it
# hard for users to use base R functions. Please check when in doubt.
Expand Down Expand Up @@ -55,6 +57,8 @@ test_that("Check masked functions", {
})

test_that("repeatedly starting and stopping SparkR", {
skip_on_cran()

for (i in 1:4) {
sc <- suppressWarnings(sparkR.init())
rdd <- parallelize(sc, 1:20, 2L)
Expand All @@ -73,6 +77,8 @@ test_that("repeatedly starting and stopping SparkSession", {
})

test_that("rdd GC across sparkR.stop", {
skip_on_cran()

sc <- sparkR.sparkContext() # sc should get id 0
rdd1 <- parallelize(sc, 1:20, 2L) # rdd1 should get id 1
rdd2 <- parallelize(sc, 1:10, 2L) # rdd2 should get id 2
Expand All @@ -96,6 +102,8 @@ test_that("rdd GC across sparkR.stop", {
})

test_that("job group functions can be called", {
skip_on_cran()

sc <- sparkR.sparkContext()
setJobGroup("groupId", "job description", TRUE)
cancelJobGroup("groupId")
Expand All @@ -108,12 +116,16 @@ test_that("job group functions can be called", {
})

test_that("utility function can be called", {
skip_on_cran()

sparkR.sparkContext()
setLogLevel("ERROR")
sparkR.session.stop()
})

test_that("getClientModeSparkSubmitOpts() returns spark-submit args from whitelist", {
skip_on_cran()

e <- new.env()
e[["spark.driver.memory"]] <- "512m"
ops <- getClientModeSparkSubmitOpts("sparkrmain", e)
Expand Down Expand Up @@ -141,6 +153,8 @@ test_that("getClientModeSparkSubmitOpts() returns spark-submit args from whiteli
})

test_that("sparkJars sparkPackages as comma-separated strings", {
skip_on_cran()

expect_warning(processSparkJars(" a, b "))
jars <- suppressWarnings(processSparkJars(" a, b "))
expect_equal(lapply(jars, basename), list("a", "b"))
Expand Down Expand Up @@ -168,6 +182,8 @@ test_that("spark.lapply should perform simple transforms", {
})

test_that("add and get file to be downloaded with Spark job on every node", {
skip_on_cran()

sparkR.sparkContext()
# Test add file.
path <- tempfile(pattern = "hello", fileext = ".txt")
Expand Down
4 changes: 4 additions & 0 deletions R/pkg/inst/tests/testthat/test_includePackage.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ nums <- 1:2
rdd <- parallelize(sc, nums, 2L)

test_that("include inside function", {
skip_on_cran()

# Only run the test if plyr is installed.
if ("plyr" %in% rownames(installed.packages())) {
suppressPackageStartupMessages(library(plyr))
Expand All @@ -42,6 +44,8 @@ test_that("include inside function", {
})

test_that("use include package", {
skip_on_cran()

# Only run the test if plyr is installed.
if ("plyr" %in% rownames(installed.packages())) {
suppressPackageStartupMessages(library(plyr))
Expand Down
17 changes: 2 additions & 15 deletions R/pkg/inst/tests/testthat/test_mllib_classification.R
Original file line number Diff line number Diff line change
Expand Up @@ -284,22 +284,11 @@ test_that("spark.mlp", {
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0"))

# test initialWeights
model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights =
model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights =
c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
expect_equal(head(mlpPredictions$prediction, 10),
c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0"))

model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights =
c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 9.0))
mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
expect_equal(head(mlpPredictions$prediction, 10),
c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0"))

model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2)
mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
expect_equal(head(mlpPredictions$prediction, 10),
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", "1.0", "0.0"))
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0"))

# Test formula works well
df <- suppressWarnings(createDataFrame(iris))
Expand All @@ -310,8 +299,6 @@ test_that("spark.mlp", {
expect_equal(summary$numOfOutputs, 3)
expect_equal(summary$layers, c(4, 3))
expect_equal(length(summary$weights), 15)
expect_equal(head(summary$weights, 5), list(-1.1957257, -5.2693685, 7.4489734, -6.3751413,
-10.2376130), tolerance = 1e-6)
})

test_that("spark.naiveBayes", {
Expand Down
4 changes: 4 additions & 0 deletions R/pkg/inst/tests/testthat/test_mllib_clustering.R
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,8 @@ test_that("spark.lda with libsvm", {
})

test_that("spark.lda with text input", {
skip_on_cran()

text <- read.text(absoluteSparkPath("data/mllib/sample_lda_data.txt"))
model <- spark.lda(text, optimizer = "online", features = "value")

Expand Down Expand Up @@ -297,6 +299,8 @@ test_that("spark.lda with text input", {
})

test_that("spark.posterior and spark.perplexity", {
skip_on_cran()

text <- read.text(absoluteSparkPath("data/mllib/sample_lda_data.txt"))
model <- spark.lda(text, features = "value", k = 3)

Expand Down
Loading