-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS #30566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I will have follow-up PR for adding DATE_FROM_UNIX_DATE/UNIX_DATE and update the error message of ANSI casting. |
|
Probably, you need to add these new functions in the ignore set: spark/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala Lines 148 to 153 in a4788ee
|
|
Test build #132031 has finished for PR 30566 at commit
|
@maropu Sorry but can you explain the reason? I am not familiar with this. |
|
|
||
| // scalastyle:off line.size.limit | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't truncate.
| checkResult(Int.MinValue.toLong - 100) | ||
| } | ||
|
|
||
| test("UNIX_SECONDS") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also test null and negative input (that truncates)
| checkEvaluation(UnixMicros(timestamp), 1000000L) | ||
| // Truncates higher levels of precision | ||
| val timestampWithNanos = new Timestamp(1000L) | ||
| timestampWithNanos.setNanos(999999) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The truncation happens in Literal.apply, not UnixMicros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can use Instant to test these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let's just use timestampWithNanos.setNanos(1000) since there won't be truncation here.
|
Test build #132042 has finished for PR 30566 at commit
|
|
retest this please. |
|
Test build #132051 has finished for PR 30566 at commit
|
| "org.apache.spark.sql.catalyst.expressions.Now", | ||
| "org.apache.spark.sql.catalyst.expressions.UnixSeconds", | ||
| "org.apache.spark.sql.catalyst.expressions.UnixMillis", | ||
| "org.apache.spark.sql.catalyst.expressions.UnixMicros", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we run the example SQL for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, the timestamp creation in the examples will be affected by the local timezone
_FUNC_(TIMESTAMP('1970-01-01 00:00:01'))
Now I update the example and add the timezone in the timestamp literal. We can test examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds better.
|
Test build #132053 has finished for PR 30566 at commit
|
|
Test build #132061 has finished for PR 30566 at commit
|
|
retest this please |
| -- UNIX_SECONDS, UNIX_MILLISECONDS and UNIX_MICROSECONDS | ||
| select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_SECONDS(null); | ||
| select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_MILLIS(null); | ||
| select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_MICROS(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add timezone into timestamp literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
| | org.apache.spark.sql.catalyst.expressions.Unhex | unhex | SELECT decode(unhex('537061726B2053514C'), 'UTF-8') | struct<decode(unhex(537061726B2053514C), UTF-8):string> | | ||
| | org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_micros(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> | | ||
| | org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_millis(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> | | ||
| | org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_seconds(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please regenerate this, @gengliangwang .
-| org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_micros(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
-| org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_millis(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
-| org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_seconds(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z')) | struct<unix_micros(CAST(1970-01-01 00:00:01Z AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z')) | struct<unix_millis(CAST(1970-01-01 00:00:01Z AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z')) | struct<unix_seconds(CAST(1970-01-01 00:00:01Z AS TIMESTAMP)):bigint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
|
Test build #132066 has finished for PR 30566 at commit
|
|
Test build #132062 has finished for PR 30566 at commit
|
|
Thank you for update, @gengliangwang ! |
|
Test build #132056 has finished for PR 30566 at commit
|
|
Test build #132067 has finished for PR 30566 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @gengliangwang and @cloud-fan .
Merged to master for Apache Spark 3.1.0.
What changes were proposed in this pull request?
As #28534 adds functions from BigQuery for converting numbers to timestamp, this PR is to add functions UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS for converting timestamp to numbers.
Why are the changes needed?
Does this PR introduce any user-facing change?
3 new functions UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS for converting timestamp to long type.
How was this patch tested?
Unit tests.