Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Support filed EPOCH in the function extract and date_part, which returns the number of seconds since 1970-01-01 00:00:00-00
For example:

> SELECT EXTRACT(EPOCH FROM TIMESTAMP'1970-01-02 00:00:00-08');
115200.0
> SELECT DATE_PART('EPOCH', TIMESTAMP'1970-01-01 00:00:01.234-00')
1.234

Note that this field only works with Date and Timestamp input. It doesn't work with Interval type. In Spark, the number of seconds of month /day is considered as uncertain in Spark (a month contains 28~31 days, a day contains 23 ~25 hours).

Why are the changes needed?

This is useful for getting the number of seconds since 1970-01-01 00:00:00-00.
PostgreSQL also supports the same field:
https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

The other reason is that since casting from TimestampType to Numeric Type is disallowed in ANSI mode, we need to provide a proper solution if a user has to do the casting.

Does this PR introduce any user-facing change?

Yes, a new field "EPOCH" for datetime function extract/date_part

How was this patch tested?

Unit test

@github-actions github-actions bot added the SQL label Nov 20, 2020
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I added EPOCH in the PR #25410 but it was removed by #28284 . What is the reason to add it back again? cc @yaooqinn @cloud-fan @dongjoon-hyun

@gengliangwang
Copy link
Member Author

The other reason is that since casting from TimestampType to Numeric Type is disallowed in ANSI mode, we need to provide a proper solution if a user has to do the casting.

@MaxGekk This is my major motivation. Actually, I didn't know we had it before and got removed.

@cloud-fan
Copy link
Contributor

Because we thought it's not commonly used. But @gengliangwang gave a good point that we need a replacement for casting datetime to numbers.

@yaooqinn
Copy link
Member

I checked the chat history with @cloud-fan, the reasons that we agreed for deleting EPOCH are 1) EPOCH is non-ANSI 2) EPOCH cannot express the meaning of extract comparing with other patterns. 3) supporters are few.

@cloud-fan
Copy link
Contributor

ah yea, it's not really extract. Can we follow big query to add 3 functions: UNIX_SECONDS, UNIX_MILLIS, UNIX_MICROS. See https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36030/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36030/

* Returns the number of seconds since 1970-01-01 00:00:00-00 (can be negative).
*/
def getSecondsAfterEpoch(micros: Long, zoneId: ZoneId): Double = {
micros.toDouble / MICROS_PER_SECOND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang If you would like to be compatible to PostgreSQL, you need to take the removed implementation:

/**
   * Returns the number of seconds with fractional part in microsecond precision
   * since 1970-01-01 00:00:00 local time.
   */
  def getEpoch(timestamp: SQLTimestamp, zoneId: ZoneId): Decimal = {
    val offset = SECONDS.toMicros(
      zoneId.getRules.getOffset(microsToInstant(timestamp)).getTotalSeconds)
    val sinceEpoch = timestamp + offset
    Decimal(sinceEpoch, 20, 6)
  }

PostgreSQL takes seconds since the local epoch 1970-01-01 00:00:00-00 but your implementation calculates seconds since 1970-01-01 00:00:00-00Z (in UTC time zone).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for pointing it out 👍

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131424 has finished for PR 30445 at commit 6a5f31b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Close this since we already have #30566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants