Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.aggregate._
import org.apache.spark.sql.catalyst.expressions.xml._
import org.apache.spark.sql.catalyst.util.StringKeyHashMap
import org.apache.spark.sql.types._


/**
Expand Down Expand Up @@ -408,8 +409,21 @@ object FunctionRegistry {
expression[BitwiseAnd]("&"),
expression[BitwiseNot]("~"),
expression[BitwiseOr]("|"),
expression[BitwiseXor]("^")

expression[BitwiseXor]("^"),

// Cast aliases (SPARK-16730)
castAlias("boolean", BooleanType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in hive, if users create a udf called boolean, will hive throw exception or override the type casting one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean is just a normal function in Hive (same as for example acos), so it would do whatever a normal function's behavior is.

castAlias("tinyint", ByteType),
castAlias("smallint", ShortType),
castAlias("int", IntegerType),
castAlias("bigint", LongType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LongType.simpleString instead of bigint looks better. Same to others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's actually worse, because it makes it less clear what the function name is by looking at this source file. Also if for some reason we change LongType.simpleString in the future, these functions will subtly break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok agree

castAlias("float", FloatType),
castAlias("double", DoubleType),
castAlias("decimal", DecimalType.USER_DEFAULT),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check it with hive? what's the default decimal type in hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what Hive's default does, but what Spark SQL's cast default.

I think it is a bug, but I'm not sure if it is intentional. I suggest we change this in a separate pull request, since there is more than one place to check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Hive do?

castAlias("date", DateType),
castAlias("timestamp", TimestampType),
castAlias("binary", BinaryType),
castAlias("string", StringType)
)

val builtin: SimpleFunctionRegistry = {
Expand Down Expand Up @@ -452,14 +466,37 @@ object FunctionRegistry {
}
}

val clazz = tag.runtimeClass
(name, (expressionInfo[T](name), builder))
}

/**
* Creates a function registry lookup entry for cast aliases (SPARK-16730).
* For example, if name is "int", and dataType is IntegerType, this means int(x) would become
* an alias for cast(x as IntegerType).
* See usage above.
*/
private def castAlias(
name: String,
dataType: DataType): (String, (ExpressionInfo, FunctionBuilder)) = {
val builder = (args: Seq[Expression]) => {
if (args.size != 1) {
throw new AnalysisException(s"Function $name accepts only one argument")
}
Cast(args.head, dataType)
}
(name, (expressionInfo[Cast](name), builder))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so whatever cast function we describe, we will always show Cast's description right? Is it same with hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is a limitation. That's not what Hive does because Hive actually does not have a single cast expression. It has a cast expression for each target type. I think it's a pretty small detail and fixing it would require a lot of work.

}

/**
* Creates an [[ExpressionInfo]] for the function as defined by expression T using the given name.
*/
private def expressionInfo[T <: Expression : ClassTag](name: String): ExpressionInfo = {
val clazz = scala.reflect.classTag[T].runtimeClass
val df = clazz.getAnnotation(classOf[ExpressionDescription])
if (df != null) {
(name,
(new ExpressionInfo(clazz.getCanonicalName, name, df.usage(), df.extended()),
builder))
new ExpressionInfo(clazz.getCanonicalName, name, df.usage(), df.extended())
} else {
(name, (new ExpressionInfo(clazz.getCanonicalName, name), builder))
new ExpressionInfo(clazz.getCanonicalName, name)
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ object Cast {
}

/** Cast the child expression to the target data type. */
@ExpressionDescription(
usage = " - Cast value v to the target data type.",
extended = "> SELECT _FUNC_('10' as int);\n 10")
case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with NullIntolerant {

override def toString: String = s"cast($child as ${dataType.simpleString})"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,14 @@

package org.apache.spark.sql

import java.math.BigDecimal
import java.sql.Timestamp

import org.apache.spark.sql.test.SharedSQLContext

/**
* A test suite for functions added for compatibility with other databases such as Oracle, MSSQL.
*
* These functions are typically implemented using the trait
* [[org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable]].
*/
Expand Down Expand Up @@ -69,4 +73,26 @@ class SQLCompatibilityFunctionSuite extends QueryTest with SharedSQLContext {
sql("SELECT nvl2(null, 1, 2.1d), nvl2('n', 1, 2.1d)"),
Row(2.1, 1.0))
}

test("SPARK-16730 cast alias functions for Hive compatibility") {
checkAnswer(
sql("SELECT boolean(1), tinyint(1), smallint(1), int(1), bigint(1)"),
Row(true, 1.toByte, 1.toShort, 1, 1L))

checkAnswer(
sql("SELECT float(1), double(1), decimal(1)"),
Row(1.toFloat, 1.0, new BigDecimal(1)))

checkAnswer(
sql("SELECT date(\"2014-04-04\"), timestamp(date(\"2014-04-04\"))"),
Row(new java.util.Date(114, 3, 4), new Timestamp(114, 3, 4, 0, 0, 0, 0)))

checkAnswer(
sql("SELECT string(1)"),
Row("1"))

// Error handling: only one argument
val errorMsg = intercept[AnalysisException](sql("SELECT string(1, 2)")).getMessage
assert(errorMsg.contains("Function string accepts only one argument"))
}
}