-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3173][SQL] Timestamp support in the parser #2084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if add it in Filter part (where ~~), maybe you need to modify the logic part of EqualTo and c2 functions in file predicates.scala & Expression.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataType parser is used only in for a CAST expression; I checked the logic of EqualTo and c2, it works out of box, f.e.:
WHERE timestamp >= CAST('2012-07-16 00:00:00' AS TIMESTAMP)
AND timestamp <= CAST('2012-07-16 01:00:00' AS TIMESTAMP)
Can you think of any corner cases I should test?
|
I think CAST is the better choice(Compared with the NO CAST method). It is implemented in the case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with a lot of dataTypes, including timestampType Besides, if you want to implement "modify Comparable expression evaluation so the the explicit casting is not necessary", you need to tell a String apart whether it is a TimeStamp format. And then modify the last line code: Literal supports TimeStamp too. stringLit ^^ { case s => try Literal(Timestamp.valueOf(s), Timestamp) catch Literal(s, StringType) } |
|
is this PR for SPARK-3065 or SPARK-3173 ? |
|
Thanks for working on this! Can you modify the PR title to point to SPARK-3173 instead? Also please add a test case in SQLQuerySuite. @chuxi I believe this PR is just extending the parser you can can cast string to timestamps. I do not think that we automatically want to do this anytime a string could be interpreted as a timestamp. |
|
I'll add the test case once I get to my office today. Exactly, the parser supported only instead of Parsing this kind of expressions happens in
The way the rest of the code is written it should probably work out of box. What do you think? |
|
Instead of doing this by modifying the predicate logic, I think that we should just add a rule in |
|
@marmbrus, I agree with you. Use CAST and so we can avoid some tough design. I know little about hive and do you mean in HiveTypeCoercion there is a CAST problem? I will try to follow the code. |
|
@marmbrus |
|
Awesome! If you get stuck with tests feel free to push broken code and ping me. |
|
I had a problem with running the tests, eventually figured it out The tests added and the literal conversion works. |
|
ok to test |
|
QA tests have started for PR 2084 at commit
|
|
QA tests have finished for PR 2084 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if left is StringType and right is the TimestampType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 47b27b4
|
QA tests have started for PR 2084 at commit
|
|
QA tests have finished for PR 2084 at commit
|
|
@marmbrus so the test fails in the jenkins build, however it passes okay on my machine (in Intellij). Any idea what's the reason for that? |
|
Could it be something with timezones? In the hive test we fix the default timezones to prevent problems. |
Conflicts: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
|
QA tests have started for PR 2084 at commit
|
|
QA tests have finished for PR 2084 at commit
|
|
Thanks! I've merged this into master. |
If you have a table with TIMESTAMP column, that column can't be used in WHERE clause properly - it is not evaluated properly. [More](https://issues.apache.org/jira/browse/SPARK-3173) Motivation: http://www.aproint.com/aggregation-with-spark-sql/ - [x] modify SqlParser so it supports casting to TIMESTAMP (workaround for item 2) - [x] the string literal should be converted into Timestamp if the column is Timestamp. Author: Zdenek Farana <zdenek.farana@gmail.com> Author: Zdenek Farana <zdenek.farana@aproint.com> Closes apache#2084 from byF/SPARK-3173 and squashes the following commits: 442b59d [Zdenek Farana] Fixed test merge conflict 2dbf4f6 [Zdenek Farana] Merge remote-tracking branch 'origin/SPARK-3173' into SPARK-3173 65b6215 [Zdenek Farana] Fixed timezone sensitivity in the test 47b27b4 [Zdenek Farana] Now works in the case of "StringLiteral=TimestampColumn" 96a661b [Zdenek Farana] Code style change 491dfcf [Zdenek Farana] Added test cases for SPARK-3173 4446b1e [Zdenek Farana] A string literal is casted into Timestamp when the column is Timestamp. 59af397 [Zdenek Farana] Added a new TIMESTAMP keyword; CAST to TIMESTAMP now can be used in SQL expression.
If you have a table with TIMESTAMP column, that column can't be used in WHERE clause properly - it is not evaluated properly. More
Motivation: http://www.aproint.com/aggregation-with-spark-sql/