[SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference #19559

DonnyZone · 2017-10-23T11:50:54Z

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-22333

In current version, users can use CURRENT_DATE() and CURRENT_TIMESTAMP() without specifying braces.
However, when a table has columns named as "current_date" or "current_timestamp", it will still be parsed as function call.

There are many such cases in our production cluster. We get the wrong answer due to this inappropriate behevior. In general, ColumnReference should get higher priority than timeFunctionCall.

How was this patch tested?

unit test
manul test

DonnyZone · 2017-10-23T12:10:52Z

ping @gatorsmile @hvanhovell @cloud-fan

hvanhovell · 2017-10-23T15:44:35Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

    | identifier                                                                               #columnReference
    | base=primaryExpression '.' fieldName=identifier                                          #dereference
    | '(' expression ')'                                                                       #parenthesizedExpression
+    | name=(CURRENT_DATE | CURRENT_TIMESTAMP)                                                  #timeFunctionCall


Won't this break every use of CURRENT_DATE/CURRENT_TIMESTAMP? They will now be interpreted as an identifier.

hvanhovell · 2017-10-23T15:44:43Z

ok to test

SparkQA · 2017-10-23T17:39:19Z

Test build #82988 has finished for PR 19559 at commit 60a5a56.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

DonnyZone · 2017-10-24T02:10:29Z

@hvanhovell Yes! I made something wrong. The timeFunctionCall naturally has conflicts with columnReference. This fix will break every use of CURRENT_DATE/CURRENT_TIMESTAMP.

For SPARK-16836,
I think this feature should be implemented in analysis phase rather than in parser phase. When there is no such columns, they can be transformed as functions. Another approach is to provide a configuration for users. However, both of the implementations seem to be hacky and complicated.

How about your idea?

gatorsmile · 2017-10-25T07:25:16Z

@DonnyZone Analyzer is the best place to fix the issue.

DonnyZone · 2017-10-25T09:39:42Z

@gatorsmile Thank for your advice, I will work on it.

SparkQA · 2017-10-25T12:44:12Z

Test build #83044 has finished for PR 19559 at commit c38ab56.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-26T02:55:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

      ExtractGenerator ::
      ResolveGenerate ::
      ResolveFunctions ::
+      ResolveLiteralFunctions ::


The order matters. It assumes ResolveReferences should be run before this rule. However, ResolveReferences might need multiple passes to resolve all the references. Thus, how about moving the logics into ResolveReferences ? If the attributes are not resolvable, we try to see whether it is a function literal?

Agree! I will refactor it.

SparkQA · 2017-10-26T03:52:36Z

Test build #83071 has finished for PR 19559 at commit d485e25.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-26T06:14:57Z

Test build #83076 has finished for PR 19559 at commit 2d42abf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-26T07:05:02Z

Test build #83077 has finished for PR 19559 at commit 5323fbb.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-26T07:38:19Z

retest this please

hvanhovell · 2017-10-26T07:54:41Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+      val literalFunctions = Seq(CurrentDate(), CurrentTimestamp())
+      val name = nameParts.head
+      val func = literalFunctions.find(e => resolver(e.prettyName, name))
+      if (func.isDefined) {


Just map over the func option.

DonnyZone · 2017-10-26T07:56:14Z

@gatorsmile @gatorsmile
There are still two issues need to be figured out.
(1)It will be complicated to determine whether a literal function should be resolved as Expression or NamedExpression.
Current fix just resolves them as NamedExpressions (i.e., Alias).
However, this leads to different schema in some cases, for example, the end-to-end test sql.
select current_date = current_date()
The output schema will be
struct<(current_date() AS ’current_date()‘ = current_date()):boolean>
(2)Shall we also support the feature in ResolveMissingReference rule?
e.g., select id from table order by current_date
The same logic in different rules brings redundant code.

hvanhovell · 2017-10-26T07:56:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-            val result = withPosition(u) { q.resolveChildren(nameParts, resolver).getOrElse(u) }
+            val result =
+              withPosition(u) {
+                q.resolveChildren(nameParts, resolver).getOrElse {


You can also use orElse:
q.resolveChildren(nameParts, resolver).orElse(resolveAsLiteralFunctions(nameParts)).getOrElse(u)

SparkQA · 2017-10-26T10:02:59Z

Test build #83078 has finished for PR 19559 at commit 5323fbb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-26T12:16:06Z

Test build #83080 has finished for PR 19559 at commit 6c932b7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-26T13:40:19Z

Test build #83081 has finished for PR 19559 at commit b8075e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-27T03:55:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+        f => Alias(f, toPrettySQL(f))()
+      } else {
+        f => f
+      }


if (nameParts.length != 1) return None val isNamedExpression = plan match { case Aggregate(_, aggs, _) => aggs.contains(attribute) case Project(projList, _) => projList.contains(attribute) case Window(windowExpressions, _, _, _) => windowExpressions.contains(attribute) case _ => false } val wrapper: Expression => Expression = if (isNamedExpression) f => Alias(f, toPrettySQL(f))() else identity

gatorsmile · 2017-10-27T03:55:40Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+          true
+        case p @ Project(projList, _) if (projList.contains(attribute)) =>
+          true
+        case _ =>


Miss Windows

gatorsmile · 2017-10-27T03:57:17Z

sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala

+          "a, b FROM ttf1"),
+        Seq(Row(true, true, 1, 2), Row(true, true, 2, 3)))
+    }
+  }


Move these to datetime.sql?

DonnyZone · 2017-10-27T06:58:54Z

Yes, ordering in Sort(ordering, global, child) is resolved in resolveExpression

SparkQA · 2017-10-27T07:04:18Z

Test build #83112 has finished for PR 19559 at commit 36b4bbb.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-27T07:05:01Z

Test build #83110 has finished for PR 19559 at commit 4b13343.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-27T07:05:02Z

Test build #83111 has finished for PR 19559 at commit a96d945.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-27T07:46:21Z

Test build #83114 has finished for PR 19559 at commit 87c2073.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-27T10:08:20Z

Test build #83117 has finished for PR 19559 at commit 5baf98e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-27T12:33:20Z

Test build #83121 has finished for PR 19559 at commit 846cee4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? This PR is to clean the related codes majorly based on the today's code review on apache#19559 ## How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> Closes apache#19585 from gatorsmile/trivialFixes.

SparkQA · 2017-10-27T15:44:18Z

Test build #83122 has finished for PR 19559 at commit 2efd4f7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-28T02:34:12Z

Test build #83138 has finished for PR 19559 at commit a57fb81.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
throw new IllegalStateException(\"The main method in the given main class must be static\")
class UnrecognizedBlockId(name: String)
assert(currentClass != null, \"The outer class can't be null.\")
assert(currentClass != null, \"The outer class can't be null.\")
assert(currentClass != null, \"The outer class can't be null.\")
class CrossValidator(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):
class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):
class ArrowWriter(val root: VectorSchemaRoot, fields: Array[ArrowFieldWriter])

gatorsmile · 2017-10-28T06:41:20Z

Thanks! Merged to master.

gatorsmile · 2017-10-28T06:43:01Z

Could you submit a backport PR to 2.2?

DonnyZone · 2017-10-30T02:22:00Z

Sure, I will submit it later.

…NT_TIMESTAMP) has conflicts with columnReference ## What changes were proposed in this pull request? This is a backport pr of #19559 for branch-2.2 ## How was this patch tested? unit tests Author: donnyzone <wellfengzhu@gmail.com> Closes #19606 from DonnyZone/branch-2.2.

…NT_TIMESTAMP) has conflicts with columnReference ## What changes were proposed in this pull request? This is a backport pr of apache#19559 for branch-2.2 ## How was this patch tested? unit tests Author: donnyzone <wellfengzhu@gmail.com> Closes apache#19606 from DonnyZone/branch-2.2.

spark-22333

60a5a56

hvanhovell reviewed Oct 23, 2017

View reviewed changes

DonnyZone changed the title ~~[SPARK-22333][SQL]ColumnReference should get higher priority than timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP)~~ [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference Oct 24, 2017

fix in analysis phase

c38ab56

fix typo

d485e25

gatorsmile reviewed Oct 26, 2017

View reviewed changes

Refactor in ResolveReferece

2d42abf

end-to-end test schema

5323fbb

hvanhovell reviewed Oct 26, 2017

View reviewed changes

Distinguish NamedExpression and Expression

6c932b7

remove test for current date/timestamp braceless expressions

b8075e1

gatorsmile reviewed Oct 27, 2017

View reviewed changes

gatorsmile mentioned this pull request Oct 27, 2017

[TRIVIAL] [SQL] Code cleaning in ResolveReferences #19585

Closed

DonnyZone added 2 commits October 27, 2017 14:59

support literal function in resolveExpression

36b4bbb

typo

87c2073

fix error

5baf98e

DonnyZone added 2 commits October 27, 2017 18:41

add new test

7e7a25c

test

846cee4

remove semicolon

2efd4f7

resolve conflicts

a57fb81

asfgit closed this in c42d208 Oct 28, 2017

DonnyZone mentioned this pull request Oct 30, 2017

[SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference #19606

Closed

cloud-fan mentioned this pull request Mar 9, 2019

[SPARK-27117][SQL] current_date/current_timestamp should not refer to columns with ansi parser mode #24039

Closed

[SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference #19559

[SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference #19559

Uh oh!

Conversation

DonnyZone commented Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

DonnyZone commented Oct 23, 2017

Uh oh!

hvanhovell Oct 23, 2017

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented Oct 23, 2017

Uh oh!

SparkQA commented Oct 23, 2017

Uh oh!

DonnyZone commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Oct 25, 2017

Uh oh!

DonnyZone commented Oct 25, 2017

Uh oh!

SparkQA commented Oct 25, 2017

Uh oh!

gatorsmile Oct 26, 2017

Choose a reason for hiding this comment

Uh oh!

DonnyZone Oct 26, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 26, 2017

Uh oh!

SparkQA commented Oct 26, 2017

Uh oh!

SparkQA commented Oct 26, 2017

Uh oh!

gatorsmile commented Oct 26, 2017

Uh oh!

hvanhovell Oct 26, 2017

Choose a reason for hiding this comment

Uh oh!

DonnyZone commented Oct 26, 2017

Uh oh!

hvanhovell Oct 26, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 26, 2017

Uh oh!

SparkQA commented Oct 26, 2017

Uh oh!

SparkQA commented Oct 26, 2017

Uh oh!

gatorsmile Oct 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 27, 2017

Choose a reason for hiding this comment

Uh oh!

DonnyZone commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

Uh oh!

SparkQA commented Oct 27, 2017

DonnyZone commented Oct 23, 2017 •

edited

Loading

DonnyZone commented Oct 24, 2017 •

edited

Loading

gatorsmile Oct 27, 2017 •

edited

Loading