-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33500][SQL] Support field "EPOCH" in datetime function extract/date_part #30445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I added EPOCH in the PR #25410 but it was removed by #28284 . What is the reason to add it back again? cc @yaooqinn @cloud-fan @dongjoon-hyun
@MaxGekk This is my major motivation. Actually, I didn't know we had it before and got removed. |
|
Because we thought it's not commonly used. But @gengliangwang gave a good point that we need a replacement for casting datetime to numbers. |
|
I checked the chat history with @cloud-fan, the reasons that we agreed for deleting EPOCH are 1) EPOCH is non-ANSI 2) EPOCH cannot express the meaning of |
|
ah yea, it's not really extract. Can we follow big query to add 3 functions: |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| * Returns the number of seconds since 1970-01-01 00:00:00-00 (can be negative). | ||
| */ | ||
| def getSecondsAfterEpoch(micros: Long, zoneId: ZoneId): Double = { | ||
| micros.toDouble / MICROS_PER_SECOND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gengliangwang If you would like to be compatible to PostgreSQL, you need to take the removed implementation:
/**
* Returns the number of seconds with fractional part in microsecond precision
* since 1970-01-01 00:00:00 local time.
*/
def getEpoch(timestamp: SQLTimestamp, zoneId: ZoneId): Decimal = {
val offset = SECONDS.toMicros(
zoneId.getRules.getOffset(microsToInstant(timestamp)).getTotalSeconds)
val sinceEpoch = timestamp + offset
Decimal(sinceEpoch, 20, 6)
}PostgreSQL takes seconds since the local epoch 1970-01-01 00:00:00-00 but your implementation calculates seconds since 1970-01-01 00:00:00-00Z (in UTC time zone).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for pointing it out 👍
|
Test build #131424 has finished for PR 30445 at commit
|
|
Close this since we already have #30566 |
What changes were proposed in this pull request?
Support filed EPOCH in the function
extractanddate_part, which returns the number of seconds since1970-01-01 00:00:00-00For example:
Note that this field only works with Date and Timestamp input. It doesn't work with Interval type. In Spark, the number of seconds of
month/dayis considered as uncertain in Spark (a month contains 28~31 days, a day contains 23 ~25 hours).Why are the changes needed?
This is useful for getting the number of seconds since
1970-01-01 00:00:00-00.PostgreSQL also supports the same field:
https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
The other reason is that since casting from TimestampType to Numeric Type is disallowed in ANSI mode, we need to provide a proper solution if a user has to do the casting.
Does this PR introduce any user-facing change?
Yes, a new field "EPOCH" for datetime function extract/date_part
How was this patch tested?
Unit test