Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type #33215

Closed
wants to merge 5 commits into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

For the timestamp literal, it should have the following behavior.

  1. When spark.sql.timestampType is TIMESTAMP_NTZ: if there is no time zone part, return timestamp without time zone literal; otherwise, return timestamp with local time zone literal

  2. When spark.sql.timestampType is TIMESTAMP_LTZ: return timestamp with local time zone literal

Why are the changes needed?

When the default timestamp type is TIMESTAMP_NTZ, the result of type literal should return TIMESTAMP_NTZ when there is no time zone part in the string.

From setion 5.3 "literal" of ANSI SQL standard 2011:

27) The declared type of a <timestamp literal> that does not specify <time zone interval> is TIMESTAMP(P) WITHOUT TIME ZONE, where P is the number of digits in <seconds fraction>, if specified, and 0 (zero) otherwise. The declared type of a <timestamp literal> that specifies <time zone interval> is TIMESTAMP(P) WITH TIME ZONE, where P is the number of digits in <seconds fraction>, if specified, and 0 (zero) otherwise.

Since we don't have "timestamp with time zone", we use timestamp with local time zone instead.

Does this PR introduce any user-facing change?

No, the new timestmap type and the default timestamp configuration is not released yet.

How was this patch tested?

Unit test

@github-actions github-actions bot added the SQL label Jul 5, 2021
@@ -248,7 +248,7 @@ object DateTimeUtils {
* @return timestamp segments, time zone id and whether the input is just time without a date. If
* the input string can't be parsed as timestamp, the result timestamp segments are empty.
*/
private def parseTimestampString(s: UTF8String): (Array[Int], Option[ZoneId], Boolean) = {
private[sql] def parseTimestampString(s: UTF8String): (Array[Int], Option[ZoneId], Boolean) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just remove "private[sql]" since we're in catalyst.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45155/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45161/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45161/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45168/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140643 has finished for PR 33215 at commit 043d99b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45168/


// If the timestamp string contains time zone, return a timestamp with local time zone literal
assertEqual("tImEstAmp '2019-01-16 20:50:00.567000+01:00'",
Literal(1547668200567000L, TimestampType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is hard to read. Can we create the literal from a Instant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It is from another test case in the same test suite.

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140651 has finished for PR 33215 at commit 381db36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140657 has finished for PR 33215 at commit da7ce20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45170/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45170/

@gengliangwang
Copy link
Member Author

Merging to master/branch-3.2

gengliangwang added a commit that referenced this pull request Jul 5, 2021
…efault timestamp type

### What changes were proposed in this pull request?

For the timestamp literal, it should have the following behavior.
1. When `spark.sql.timestampType` is TIMESTAMP_NTZ: if there is no time zone part, return timestamp without time zone literal; otherwise, return timestamp with local time zone literal

2. When `spark.sql.timestampType` is TIMESTAMP_LTZ: return timestamp with local time zone literal

### Why are the changes needed?

When the default timestamp type is TIMESTAMP_NTZ, the result of type literal should return TIMESTAMP_NTZ when there is no time zone part in the string.

From setion 5.3 "literal" of ANSI SQL standard 2011:
```
27) The declared type of a <timestamp literal> that does not specify <time zone interval> is TIMESTAMP(P) WITHOUT TIME ZONE, where P is the number of digits in <seconds fraction>, if specified, and 0 (zero) otherwise. The declared type of a <timestamp literal> that specifies <time zone interval> is TIMESTAMP(P) WITH TIME ZONE, where P is the number of digits in <seconds fraction>, if specified, and 0 (zero) otherwise.
```
Since we don't have "timestamp with time zone", we use timestamp with local time zone instead.
### Does this PR introduce _any_ user-facing change?

No, the new timestmap type and the default timestamp configuration is not released yet.

### How was this patch tested?

Unit test

Closes #33215 from gengliangwang/tsLiteral.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 2fffec7)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140659 has finished for PR 33215 at commit 4e90fb9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants