[SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType #9720

kevinyu98 · 2015-11-15T08:21:38Z

During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.

I proposal to the changes through this PR, can you review my code changes ?

This problem only happen for <=>, other operators works fine.

scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]

scala> filteredDF.show
+------+
|column|
+------+
+------+

scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
filteredDF: org.apache.spark.sql.DataFrame = [column: string]

scala> filteredDF.show
+------+
|column|
+------+
+------+

scala> df.registerTempTable("DF")

scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]

scala> res27.show
+------+
|column|
+------+
+------+

cloud-fan · 2015-11-16T01:43:55Z

ok to test

cloud-fan · 2015-11-16T01:51:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala

this rule looks weird to me, how about casting to tightest common type of left and right? cc @marmbrus @yhuai @nongli

Seems @liancheng added this change.

Hm, actually I only simplified the original rule with conciser pattern matching in PR #6537 (here). Tracked down the history and it turned out that this rule had already been there ever since the very first commit of Spark SQL by @marmbrus :) (here).

The goal here was to mimic hive's type coercion rules. I think if you create a compatibility test like SELECT "0001" = 1 this rule is required (if its not then we could consider changing this).

hive do support SELECT "0001" = 1, however, I think this rule is too simple, how about using findTightestCommonTypeToString?

I think so, this rule will fire first and change the type to DoubleType.
btw I think it's a bad smell to have conflict rules, we should improve it and make sure it only handles cases that missed by ImplicitTypeCasts.

@cloud-fan : do you want me to open a new jira to look into this? The new jira/pr will focus on the rules in PromoteStrings and ImplicitTypeCasts, as you suggested to reduce the redundant rules in PromoteStrings.

@kevinyu98 I do not think that is really a problem for now. I think we do not need a jira for that right now.

@kevinyu98 please hold off until you find something is broken by this and we have to fix it.

@yhuai @cloud-fan : sure, I will not do that. I will try to run more testing to see if anything is broken.

SparkQA · 2015-11-16T01:58:25Z

Test build #45967 has finished for PR 9720 at commit bb705ca.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-16T05:13:57Z

Test build #45974 has finished for PR 9720 at commit 5a5be06.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2015-11-17T05:47:22Z

LGTM, thanks for working on this @kevinyu98

kevinyu98 · 2015-11-17T06:38:25Z

@cloud-fan and @marmbrus @yhuai @nongli @liancheng : thanks for reviewing the fix.

yhuai · 2015-11-17T06:52:34Z

Merging to master and branch 1.6.

…son between NullType and StringType During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira. I proposal to the changes through this PR, can you review my code changes ? This problem only happen for <=>, other operators works fine. scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null)))) filteredDF: org.apache.spark.sql.DataFrame = [column: string] scala> filteredDF.show +------+ |column| +------+ +------+ scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null)))) filteredDF: org.apache.spark.sql.DataFrame = [column: string] scala> filteredDF.show +------+ |column| +------+ +------+ scala> df.registerTempTable("DF") scala> sqlContext.sql("select * from DF where 'column' = NULL") res27: org.apache.spark.sql.DataFrame = [column: string] scala> res27.show +------+ |column| +------+ +------+ Author: Kevin Yu <qyu@us.ibm.com> Closes #9720 from kevinyu98/working_on_spark-11447. (cherry picked from commit e01865a) Signed-off-by: Yin Huai <yhuai@databricks.com>

…son between NullType and StringType During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira. I proposal to the changes through this PR, can you review my code changes ? This problem only happen for <=>, other operators works fine. scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null)))) filteredDF: org.apache.spark.sql.DataFrame = [column: string] scala> filteredDF.show +------+ |column| +------+ +------+ scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null)))) filteredDF: org.apache.spark.sql.DataFrame = [column: string] scala> filteredDF.show +------+ |column| +------+ +------+ scala> df.registerTempTable("DF") scala> sqlContext.sql("select * from DF where 'column' = NULL") res27: org.apache.spark.sql.DataFrame = [column: string] scala> res27.show +------+ |column| +------+ +------+ Author: Kevin Yu <qyu@us.ibm.com> Closes apache#9720 from kevinyu98/working_on_spark-11447.

kevinyu98 added 2 commits November 13, 2015 10:11

[SPARK-11447]Check NullType before Promote StringType

b53b85c

add testcase in ColumnExpressionSuite

bb705ca

cloud-fan reviewed Nov 16, 2015
View reviewed changes

fix Scala style

5a5be06

asfgit closed this in e01865a Nov 17, 2015

[SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType #9720

[SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType #9720

Uh oh!

Conversation

kevinyu98 commented Nov 15, 2015

Uh oh!

cloud-fan commented Nov 16, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 16, 2015

Uh oh!

SparkQA commented Nov 16, 2015

Uh oh!

cloud-fan commented Nov 17, 2015

Uh oh!

kevinyu98 commented Nov 17, 2015

Uh oh!

yhuai commented Nov 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants