Skip to content

Conversation

@DonnyZone
Copy link
Contributor

@DonnyZone DonnyZone commented Oct 23, 2017

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-22333

In current version, users can use CURRENT_DATE() and CURRENT_TIMESTAMP() without specifying braces.
However, when a table has columns named as "current_date" or "current_timestamp", it will still be parsed as function call.

There are many such cases in our production cluster. We get the wrong answer due to this inappropriate behevior. In general, ColumnReference should get higher priority than timeFunctionCall.

How was this patch tested?

unit test
manul test

@DonnyZone
Copy link
Contributor Author

ping @gatorsmile @hvanhovell @cloud-fan

| identifier #columnReference
| base=primaryExpression '.' fieldName=identifier #dereference
| '(' expression ')' #parenthesizedExpression
| name=(CURRENT_DATE | CURRENT_TIMESTAMP) #timeFunctionCall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break every use of CURRENT_DATE/CURRENT_TIMESTAMP? They will now be interpreted as an identifier.

@hvanhovell
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82988 has finished for PR 19559 at commit 60a5a56.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DonnyZone
Copy link
Contributor Author

DonnyZone commented Oct 24, 2017

@hvanhovell Yes! I made something wrong. The timeFunctionCall naturally has conflicts with columnReference. This fix will break every use of CURRENT_DATE/CURRENT_TIMESTAMP.

For SPARK-16836,
I think this feature should be implemented in analysis phase rather than in parser phase. When there is no such columns, they can be transformed as functions. Another approach is to provide a configuration for users. However, both of the implementations seem to be hacky and complicated.

How about your idea?

@DonnyZone DonnyZone changed the title [SPARK-22333][SQL]ColumnReference should get higher priority than timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference Oct 24, 2017
@gatorsmile
Copy link
Member

@DonnyZone Analyzer is the best place to fix the issue.

@DonnyZone
Copy link
Contributor Author

@gatorsmile Thank for your advice, I will work on it.

@SparkQA
Copy link

SparkQA commented Oct 25, 2017

Test build #83044 has finished for PR 19559 at commit c38ab56.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ExtractGenerator ::
ResolveGenerate ::
ResolveFunctions ::
ResolveLiteralFunctions ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order matters. It assumes ResolveReferences should be run before this rule. However, ResolveReferences might need multiple passes to resolve all the references. Thus, how about moving the logics into ResolveReferences ? If the attributes are not resolvable, we try to see whether it is a function literal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! I will refactor it.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83071 has finished for PR 19559 at commit d485e25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83076 has finished for PR 19559 at commit 2d42abf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83077 has finished for PR 19559 at commit 5323fbb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

val literalFunctions = Seq(CurrentDate(), CurrentTimestamp())
val name = nameParts.head
val func = literalFunctions.find(e => resolver(e.prettyName, name))
if (func.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just map over the func option.

@DonnyZone
Copy link
Contributor Author

@gatorsmile @gatorsmile
There are still two issues need to be figured out.
(1)It will be complicated to determine whether a literal function should be resolved as Expression or NamedExpression.
Current fix just resolves them as NamedExpressions (i.e., Alias).
However, this leads to different schema in some cases, for example, the end-to-end test sql.
select current_date = current_date()
The output schema will be
struct<(current_date() AS ’current_date()‘ = current_date()):boolean>
(2)Shall we also support the feature in ResolveMissingReference rule?
e.g., select id from table order by current_date
The same logic in different rules brings redundant code.

val result = withPosition(u) { q.resolveChildren(nameParts, resolver).getOrElse(u) }
val result =
withPosition(u) {
q.resolveChildren(nameParts, resolver).getOrElse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use orElse:
q.resolveChildren(nameParts, resolver).orElse(resolveAsLiteralFunctions(nameParts)).getOrElse(u)

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83078 has finished for PR 19559 at commit 5323fbb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83080 has finished for PR 19559 at commit 6c932b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83081 has finished for PR 19559 at commit b8075e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

f => Alias(f, toPrettySQL(f))()
} else {
f => f
}
Copy link
Member

@gatorsmile gatorsmile Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      if (nameParts.length != 1) return None
      val isNamedExpression = plan match {
        case Aggregate(_, aggs, _) => aggs.contains(attribute)
        case Project(projList, _) => projList.contains(attribute)
        case Window(windowExpressions, _, _, _) => windowExpressions.contains(attribute)
        case _ => false
      }
      val wrapper: Expression => Expression =
        if (isNamedExpression) f => Alias(f, toPrettySQL(f))() else identity

true
case p @ Project(projList, _) if (projList.contains(attribute)) =>
true
case _ =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miss Windows

"a, b FROM ttf1"),
Seq(Row(true, true, 1, 2), Row(true, true, 2, 3)))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these to datetime.sql?

@DonnyZone
Copy link
Contributor Author

Yes, ordering in Sort(ordering, global, child) is resolved in resolveExpression

@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83112 has finished for PR 19559 at commit 36b4bbb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83110 has finished for PR 19559 at commit 4b13343.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83111 has finished for PR 19559 at commit a96d945.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83114 has finished for PR 19559 at commit 87c2073.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83117 has finished for PR 19559 at commit 5baf98e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83121 has finished for PR 19559 at commit 846cee4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ghost pushed a commit to dbtsai/spark that referenced this pull request Oct 27, 2017
## What changes were proposed in this pull request?
This PR is to clean the related codes majorly based on the today's code review on  apache#19559

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#19585 from gatorsmile/trivialFixes.
@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83122 has finished for PR 19559 at commit 2efd4f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 28, 2017

Test build #83138 has finished for PR 19559 at commit a57fb81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new IllegalStateException(\"The main method in the given main class must be static\")
  • class UnrecognizedBlockId(name: String)
  • assert(currentClass != null, \"The outer class can't be null.\")
  • assert(currentClass != null, \"The outer class can't be null.\")
  • assert(currentClass != null, \"The outer class can't be null.\")
  • class CrossValidator(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):
  • class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):
  • class ArrowWriter(val root: VectorSchemaRoot, fields: Array[ArrowFieldWriter])

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@gatorsmile
Copy link
Member

Could you submit a backport PR to 2.2?

@asfgit asfgit closed this in c42d208 Oct 28, 2017
@DonnyZone
Copy link
Contributor Author

Sure, I will submit it later.

asfgit pushed a commit that referenced this pull request Oct 31, 2017
…NT_TIMESTAMP) has conflicts with columnReference

## What changes were proposed in this pull request?

This is a backport pr of #19559
for branch-2.2

## How was this patch tested?
unit tests

Author: donnyzone <wellfengzhu@gmail.com>

Closes #19606 from DonnyZone/branch-2.2.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…NT_TIMESTAMP) has conflicts with columnReference

## What changes were proposed in this pull request?

This is a backport pr of apache#19559
for branch-2.2

## How was this patch tested?
unit tests

Author: donnyzone <wellfengzhu@gmail.com>

Closes apache#19606 from DonnyZone/branch-2.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants