Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,11 @@ setMethod("coltypes",
}

if (is.null(type)) {
stop(paste("Unsupported data type: ", x))
specialtype <- specialtypeshandle(x)
Copy link
Member

@felixcheung felixcheung Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it looks like we are expecting type to be R native type, but seems like specialtypeshandle is returning double which is not a R native type? (per se - for historical reasons)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or rather, numeric is better since it matches the mapping in PRIMITIVE_TYPES?

Copy link
Contributor Author

@wangmiao1981 wangmiao1981 Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numeric is not the key of PRIMITIVE_TYPE map. is.null(PRIMITIVE_TYPES[[colType]]) is NULL if mapped to numeric type.

Copy link
Member

@felixcheung felixcheung Aug 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as line 383
type <- PRIMITIVE_TYPES[[x]]
here in line 399
specialtype <- specialtypeshandle(x)
type <- specialtype

numeric is one of the value from PRIMITIVE_TYPE, double, returned by specialtypeshandle, is not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then from my comment below, line 1059
colType <- dtypes[[colIndex]][[2]]
colType is from dtypes which is the JVM type.
specialtype <- specialtypeshandle(colType)
so seem like we are mixing in both cases.

if (is.null(specialtype)) {
stop(paste("Unsupported data type: ", x))
}
type <- PRIMITIVE_TYPES[[specialtype]]
}
}
type
Expand Down Expand Up @@ -1063,6 +1067,13 @@ setMethod("collect",
df[[colIndex]] <- col
} else {
colType <- dtypes[[colIndex]][[2]]
if (is.null(PRIMITIVE_TYPES[[colType]])) {
specialtype <- specialtypeshandle(colType)
if (!is.null(specialtype)) {
colType <- specialtype
Copy link
Member

@felixcheung felixcheung Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case colType is supposed to be JVM type, so we are mixing in what we are using the function for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung You are right. Here is a mistake. The goal of this function is to change the Key for the dictionary PRIMITIVE_TYPES. The issue is because backend returns decimal(10,0) as the Key. Let me review the whole PR again and make changes as you suggested. Thanks!

}
}

# Note that "binary" columns behave like complex types.
if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != "binary") {
vec <- do.call(c, col)
Expand Down
16 changes: 16 additions & 0 deletions R/pkg/R/types.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,19 @@ rToSQLTypes <- as.environment(list(
"double" = "double",
"character" = "string",
"logical" = "boolean"))

# Helper function of coverting decimal type. When backend returns column type in the
# format of decimal(,) (e.g., decimal(10, 0)), this function coverts the column type
# as double type. This function converts backend returned types that are not the key
# of PRIMITIVE_TYPES, but should be treated as PRIMITIVE_TYPES.
# @param A type returned from the JVM backend.
# @return A type is the key of the PRIMITIVE_TYPES.
specialtypeshandle <- function(type) {
returntype <- NULL
m <- regexec("^decimal(.+)$", type)
matchedStrings <- regmatches(type, m)
if (length(matchedStrings[[1]]) >= 2) {
returntype <- "double"
}
returntype
}
22 changes: 22 additions & 0 deletions R/pkg/inst/tests/testthat/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,17 @@ test_that(
expect_is(newdf, "SparkDataFrame")
expect_equal(count(newdf), 1)
dropTempView("table1")

createOrReplaceTempView(df, "dfView")
sqlCast <- collect(sql("select cast('2' as decimal) as x from dfView limit 1"))
out <- capture.output(sqlCast)
expect_true(is.data.frame(sqlCast))
expect_equal(names(sqlCast)[1], "x")
expect_equal(nrow(sqlCast), 1)
expect_equal(ncol(sqlCast), 1)
expect_equal(out[1], " x")
expect_equal(out[2], "1 2")
dropTempView("dfView")
})

test_that("test cache, uncache and clearCache", {
Expand Down Expand Up @@ -2089,6 +2100,9 @@ test_that("Method coltypes() to get and set R's data types of a DataFrame", {
# Test primitive types
DF <- createDataFrame(data, schema)
expect_equal(coltypes(DF), c("integer", "logical", "POSIXct"))
createOrReplaceTempView(DF, "DFView")
sqlCast <- sql("select cast('2' as decimal) as x from DFView limit 1")
expect_equal(coltypes(sqlCast), "numeric")

# Test complex types
x <- createDataFrame(list(list(as.environment(
Expand Down Expand Up @@ -2132,6 +2146,14 @@ test_that("Method str()", {
"setosa\" \"setosa\" \"setosa\" \"setosa\""))
expect_equal(out[7], " $ col : logi TRUE TRUE TRUE TRUE TRUE TRUE")

createOrReplaceTempView(irisDF2, "irisView")

sqlCast <- sql("select cast('2' as decimal) as x from irisView limit 1")
castStr <- capture.output(str(sqlCast))
expect_equal(length(castStr), 2)
expect_equal(castStr[1], "'SparkDataFrame': 1 variables:")
expect_equal(castStr[2], " $ x: num 2")

# A random dataset with many columns. This test is to check str limits
# the number of columns. Therefore, it will suffice to check for the
# number of returned rows
Expand Down