-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16730][SQL] Implement function aliases for type casts #14364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.expressions._ | |
| import org.apache.spark.sql.catalyst.expressions.aggregate._ | ||
| import org.apache.spark.sql.catalyst.expressions.xml._ | ||
| import org.apache.spark.sql.catalyst.util.StringKeyHashMap | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
|
|
||
| /** | ||
|
|
@@ -408,8 +409,21 @@ object FunctionRegistry { | |
| expression[BitwiseAnd]("&"), | ||
| expression[BitwiseNot]("~"), | ||
| expression[BitwiseOr]("|"), | ||
| expression[BitwiseXor]("^") | ||
|
|
||
| expression[BitwiseXor]("^"), | ||
|
|
||
| // Cast aliases (SPARK-16730) | ||
| castAlias("boolean", BooleanType), | ||
| castAlias("tinyint", ByteType), | ||
| castAlias("smallint", ShortType), | ||
| castAlias("int", IntegerType), | ||
| castAlias("bigint", LongType), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's actually worse, because it makes it less clear what the function name is by looking at this source file. Also if for some reason we change LongType.simpleString in the future, these functions will subtly break.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok agree |
||
| castAlias("float", FloatType), | ||
| castAlias("double", DoubleType), | ||
| castAlias("decimal", DecimalType.USER_DEFAULT), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you double check it with hive? what's the default decimal type in hive?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not what Hive's default does, but what Spark SQL's cast default. I think it is a bug, but I'm not sure if it is intentional. I suggest we change this in a separate pull request, since there is more than one place to check.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does Hive do? |
||
| castAlias("date", DateType), | ||
| castAlias("timestamp", TimestampType), | ||
| castAlias("binary", BinaryType), | ||
| castAlias("string", StringType) | ||
| ) | ||
|
|
||
| val builtin: SimpleFunctionRegistry = { | ||
|
|
@@ -452,14 +466,37 @@ object FunctionRegistry { | |
| } | ||
| } | ||
|
|
||
| val clazz = tag.runtimeClass | ||
| (name, (expressionInfo[T](name), builder)) | ||
| } | ||
|
|
||
| /** | ||
| * Creates a function registry lookup entry for cast aliases (SPARK-16730). | ||
| * For example, if name is "int", and dataType is IntegerType, this means int(x) would become | ||
| * an alias for cast(x as IntegerType). | ||
| * See usage above. | ||
| */ | ||
| private def castAlias( | ||
| name: String, | ||
| dataType: DataType): (String, (ExpressionInfo, FunctionBuilder)) = { | ||
| val builder = (args: Seq[Expression]) => { | ||
| if (args.size != 1) { | ||
| throw new AnalysisException(s"Function $name accepts only one argument") | ||
| } | ||
| Cast(args.head, dataType) | ||
| } | ||
| (name, (expressionInfo[Cast](name), builder)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so whatever cast function we describe, we will always show
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes - this is a limitation. That's not what Hive does because Hive actually does not have a single cast expression. It has a cast expression for each target type. I think it's a pretty small detail and fixing it would require a lot of work. |
||
| } | ||
|
|
||
| /** | ||
| * Creates an [[ExpressionInfo]] for the function as defined by expression T using the given name. | ||
| */ | ||
| private def expressionInfo[T <: Expression : ClassTag](name: String): ExpressionInfo = { | ||
| val clazz = scala.reflect.classTag[T].runtimeClass | ||
| val df = clazz.getAnnotation(classOf[ExpressionDescription]) | ||
| if (df != null) { | ||
| (name, | ||
| (new ExpressionInfo(clazz.getCanonicalName, name, df.usage(), df.extended()), | ||
| builder)) | ||
| new ExpressionInfo(clazz.getCanonicalName, name, df.usage(), df.extended()) | ||
| } else { | ||
| (name, (new ExpressionInfo(clazz.getCanonicalName, name), builder)) | ||
| new ExpressionInfo(clazz.getCanonicalName, name) | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in hive, if users create a udf called
boolean, will hive throw exception or override the type casting one?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean is just a normal function in Hive (same as for example acos), so it would do whatever a normal function's behavior is.