Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,12 @@ object HiveTypeCoercion {
case p @ BinaryComparison(left @ DateType(), right @ TimestampType()) =>
p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType)))

// Checking NullType
case p @ BinaryComparison(left @ StringType(), right @ NullType()) =>
p.makeCopy(Array(left, Literal.create(null, StringType)))
case p @ BinaryComparison(left @ NullType(), right @ StringType()) =>
p.makeCopy(Array(Literal.create(null, StringType), right))

case p @ BinaryComparison(left @ StringType(), right) if right.dataType != StringType =>
p.makeCopy(Array(Cast(left, DoubleType), right))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rule looks weird to me, how about casting to tightest common type of left and right? cc @marmbrus @yhuai @nongli

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems @liancheng added this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually I only simplified the original rule with conciser pattern matching in PR #6537 (here). Tracked down the history and it turned out that this rule had already been there ever since the very first commit of Spark SQL by @marmbrus :) (here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here was to mimic hive's type coercion rules. I think if you create a compatibility test like SELECT "0001" = 1 this rule is required (if its not then we could consider changing this).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive do support SELECT "0001" = 1, however, I think this rule is too simple, how about using findTightestCommonTypeToString?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, this rule will fire first and change the type to DoubleType.
btw I think it's a bad smell to have conflict rules, we should improve it and make sure it only handles cases that missed by ImplicitTypeCasts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan : do you want me to open a new jira to look into this? The new jira/pr will focus on the rules in PromoteStrings and ImplicitTypeCasts, as you suggested to reduce the redundant rules in PromoteStrings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinyu98 I do not think that is really a problem for now. I think we do not need a jira for that right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinyu98 please hold off until you find something is broken by this and we have to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai @cloud-fan : sure, I will not do that. I will try to run more testing to see if anything is broken.

case p @ BinaryComparison(left, right @ StringType()) if left.dataType != StringType =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,17 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext {
checkAnswer(
nullData.filter($"a" <=> $"b"),
Row(1, 1) :: Row(null, null) :: Nil)

val nullData2 = sqlContext.createDataFrame(sparkContext.parallelize(
Row("abc") ::
Row(null) ::
Row("xyz") :: Nil),
StructType(Seq(StructField("a", StringType, true))))

checkAnswer(
nullData2.filter($"a" <=> null),
Row(null) :: Nil)

}

test(">") {
Expand Down