-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType #9720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems @liancheng added this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here was to mimic hive's type coercion rules. I think if you create a compatibility test like SELECT "0001" = 1 this rule is required (if its not then we could consider changing this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive do support SELECT "0001" = 1, however, I think this rule is too simple, how about using findTightestCommonTypeToString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, this rule will fire first and change the type to DoubleType.
btw I think it's a bad smell to have conflict rules, we should improve it and make sure it only handles cases that missed by ImplicitTypeCasts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan : do you want me to open a new jira to look into this? The new jira/pr will focus on the rules in PromoteStrings and ImplicitTypeCasts, as you suggested to reduce the redundant rules in PromoteStrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinyu98 I do not think that is really a problem for now. I think we do not need a jira for that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinyu98 please hold off until you find something is broken by this and we have to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai @cloud-fan : sure, I will not do that. I will try to run more testing to see if anything is broken.
|
Test build #45967 has finished for PR 9720 at commit
|
|
Test build #45974 has finished for PR 9720 at commit
|
|
LGTM, thanks for working on this @kevinyu98 |
|
@cloud-fan and @marmbrus @yhuai @nongli @liancheng : thanks for reviewing the fix. |
|
Merging to master and branch 1.6. |
…son between NullType and StringType
During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+
Author: Kevin Yu <qyu@us.ibm.com>
Closes #9720 from kevinyu98/working_on_spark-11447.
(cherry picked from commit e01865a)
Signed-off-by: Yin Huai <yhuai@databricks.com>
…son between NullType and StringType
During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+
Author: Kevin Yu <qyu@us.ibm.com>
Closes apache#9720 from kevinyu98/working_on_spark-11447.
During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
I proposal to the changes through this PR, can you review my code changes ?
This problem only happen for <=>, other operators works fine.
scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]
scala> filteredDF.show
+------+
|column|
+------+
+------+
scala> df.registerTempTable("DF")
scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]
scala> res27.show
+------+
|column|
+------+
+------+