Skip to content

Conversation

@byF
Copy link
Contributor

@byF byF commented Aug 21, 2014

If you have a table with TIMESTAMP column, that column can't be used in WHERE clause properly - it is not evaluated properly. More

Motivation: http://www.aproint.com/aggregation-with-spark-sql/

  • modify SqlParser so it supports casting to TIMESTAMP (workaround for item 2)
  • the string literal should be converted into Timestamp if the column is Timestamp.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if add it in Filter part (where ~~), maybe you need to modify the logic part of EqualTo and c2 functions in file predicates.scala & Expression.scala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataType parser is used only in for a CAST expression; I checked the logic of EqualTo and c2, it works out of box, f.e.:

WHERE timestamp >= CAST('2012-07-16 00:00:00' AS TIMESTAMP)
             AND timestamp <= CAST('2012-07-16 01:00:00' AS TIMESTAMP)

Can you think of any corner cases I should test?

@chuxi
Copy link

chuxi commented Aug 24, 2014

I think CAST is the better choice(Compared with the NO CAST method). It is implemented in the

case class Cast(child: Expression, dataType: DataType) extends UnaryExpression

with a lot of dataTypes, including timestampType

Besides, if you want to implement "modify Comparable expression evaluation so the the explicit casting is not necessary", you need to tell a String apart whether it is a TimeStamp format. And then modify the last line code:

protected lazy val literal: Parser[Literal] =
    numericLit ^^ {
      case i if i.toLong > Int.MaxValue => Literal(i.toLong)
      case i => Literal(i.toInt)
    } |
    NULL ^^^ Literal(null, NullType) |
    floatLit ^^ {case f => Literal(f.toDouble) } |
    stringLit ^^ {case s => Literal(s, StringType) }

Literal supports TimeStamp too.

stringLit ^^ { case s => try Literal(Timestamp.valueOf(s), Timestamp) catch Literal(s, StringType) }

@chutium
Copy link
Contributor

chutium commented Aug 25, 2014

is this PR for SPARK-3065 or SPARK-3173 ?

@marmbrus
Copy link
Contributor

Thanks for working on this!

Can you modify the PR title to point to SPARK-3173 instead? Also please add a test case in SQLQuerySuite.

@chuxi I believe this PR is just extending the parser you can can cast string to timestamps. I do not think that we automatically want to do this anytime a string could be interpreted as a timestamp.

@byF byF changed the title [SPARK-3065][SQL] Timestamp support in the parser [SPARK-3173][SQL] Timestamp support in the parser Aug 26, 2014
@byF
Copy link
Contributor Author

byF commented Aug 26, 2014

I'll add the test case once I get to my office today.

Exactly, the parser supported only CAST(... AS STRING). That's the problem for me. However, it would be nice if I could do:

WHERE timestamp >= '2012-07-16 00:00:00'
             AND timestamp <= '2012-07-16 01:00:00'

instead of

WHERE timestamp >= CAST('2012-07-16 00:00:00' AS TIMESTAMP)
             AND timestamp <= CAST('2012-07-16 01:00:00' AS TIMESTAMP)

Parsing this kind of expressions happens in SqlParser.comparisonExpression; I consider modifying predicates.scala in a way that would say:

If a queried expression has the TimestampType, replace the value expression with Cast.

The way the rest of the code is written it should probably work out of box. What do you think?

@marmbrus
Copy link
Contributor

Instead of doing this by modifying the predicate logic, I think that we should just add a rule in HiveTypeCoercion that finds BinaryPredicates where one side is a string and the other side is a timestamp and adds a Cast. Perhaps in PromoteStrings?

@chuxi
Copy link

chuxi commented Aug 26, 2014

@marmbrus, I agree with you. Use CAST and so we can avoid some tough design. I know little about hive and do you mean in HiveTypeCoercion there is a CAST problem? I will try to follow the code.

@byF
Copy link
Contributor Author

byF commented Aug 26, 2014

@marmbrus PromoteStrings makes perfect sense; I've got troubles running tests on my machine, I'll push the fix once I make the test work.

@marmbrus
Copy link
Contributor

Awesome! If you get stuck with tests feel free to push broken code and ping me.

@byF
Copy link
Contributor Author

byF commented Aug 27, 2014

I had a problem with running the tests, eventually figured it out

The tests added and the literal conversion works.

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2084 at commit 491dfcf.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2084 at commit 491dfcf.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if left is StringType and right is the TimestampType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47b27b4

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have started for PR 2084 at commit 47b27b4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have finished for PR 2084 at commit 47b27b4.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@byF
Copy link
Contributor Author

byF commented Aug 28, 2014

@marmbrus so the test fails in the jenkins build, however it passes okay on my machine (in Intellij). Any idea what's the reason for that?

@marmbrus
Copy link
Contributor

Could it be something with timezones? In the hive test we fix the default timezones to prevent problems.

@SparkQA
Copy link

SparkQA commented Aug 29, 2014

QA tests have started for PR 2084 at commit 442b59d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 29, 2014

QA tests have finished for PR 2084 at commit 442b59d.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks! I've merged this into master.

@asfgit asfgit closed this in 98ddbe6 Aug 29, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
If you have a table with TIMESTAMP column, that column can't be used in WHERE clause properly - it is not evaluated properly. [More](https://issues.apache.org/jira/browse/SPARK-3173)

Motivation: http://www.aproint.com/aggregation-with-spark-sql/

- [x] modify SqlParser so it supports casting to TIMESTAMP (workaround for item 2)
- [x] the string literal should be converted into Timestamp if the column is Timestamp.

Author: Zdenek Farana <zdenek.farana@gmail.com>
Author: Zdenek Farana <zdenek.farana@aproint.com>

Closes apache#2084 from byF/SPARK-3173 and squashes the following commits:

442b59d [Zdenek Farana] Fixed test merge conflict
2dbf4f6 [Zdenek Farana] Merge remote-tracking branch 'origin/SPARK-3173' into SPARK-3173
65b6215 [Zdenek Farana] Fixed timezone sensitivity in the test
47b27b4 [Zdenek Farana] Now works in the case of "StringLiteral=TimestampColumn"
96a661b [Zdenek Farana] Code style change
491dfcf [Zdenek Farana] Added test cases for SPARK-3173
4446b1e [Zdenek Farana] A string literal is casted into Timestamp when the column is Timestamp.
59af397 [Zdenek Farana] Added a new TIMESTAMP keyword; CAST to TIMESTAMP now can be used in SQL expression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants