[SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS #30566

gengliangwang · 2020-12-02T08:07:55Z

What changes were proposed in this pull request?

As #28534 adds functions from BigQuery for converting numbers to timestamp, this PR is to add functions UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS for converting timestamp to numbers.

Why are the changes needed?

Symmetry of the conversion functions
Casting timestamp type to numeric types is disallowed in ANSI mode, we should provide functions for users to complete the conversion.

Does this PR introduce any user-facing change?

3 new functions UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS for converting timestamp to long type.

How was this patch tested?

Unit tests.

gengliangwang · 2020-12-02T08:09:28Z

I will have follow-up PR for adding DATE_FROM_UNIX_DATE/UNIX_DATE and update the error message of ANSI casting.

maropu · 2020-12-02T08:39:10Z

Probably, you need to add these new functions in the ignore set:

spark/sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala

Lines 148 to 153 in a4788ee

    
           // One of examples shows getting the current timestamp 
        
           "org.apache.spark.sql.catalyst.expressions.UnixTimestamp", 
        
           "org.apache.spark.sql.catalyst.expressions.CurrentDate", 
        
           "org.apache.spark.sql.catalyst.expressions.CurrentTimestamp", 
        
           "org.apache.spark.sql.catalyst.expressions.CurrentTimeZone", 
        
           "org.apache.spark.sql.catalyst.expressions.Now",

SparkQA · 2020-12-02T08:49:38Z

Test build #132031 has finished for PR 30566 at commit 0b31348.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-12-02T11:04:32Z

Probably, you need to add these new functions in the ignore set:

@maropu Sorry but can you explain the reason? I am not familiar with this.

cloud-fan · 2020-12-02T11:35:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.",


This function doesn't truncate.

cloud-fan · 2020-12-02T11:36:49Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala

    checkResult(Int.MinValue.toLong - 100)
  }

+  test("UNIX_SECONDS") {


let's also test null and negative input (that truncates)

cloud-fan · 2020-12-02T11:37:54Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala

+    checkEvaluation(UnixMicros(timestamp), 1000000L)
+    // Truncates higher levels of precision
+    val timestampWithNanos = new Timestamp(1000L)
+    timestampWithNanos.setNanos(999999)


The truncation happens in Literal.apply, not UnixMicros

maybe we can use Instant to test these functions.

I see. Let's just use timestampWithNanos.setNanos(1000) since there won't be truncation here.

SparkQA · 2020-12-02T12:44:28Z

Test build #132042 has finished for PR 30566 at commit 35ef5d2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-12-02T13:05:46Z

retest this please.

SparkQA · 2020-12-02T15:18:59Z

Test build #132051 has finished for PR 30566 at commit 45495c7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-12-02T16:09:43Z

sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala

      "org.apache.spark.sql.catalyst.expressions.Now",
+      "org.apache.spark.sql.catalyst.expressions.UnixSeconds",
+      "org.apache.spark.sql.catalyst.expressions.UnixMillis",
+      "org.apache.spark.sql.catalyst.expressions.UnixMicros",


why can't we run the example SQL for them?

Originally, the timestamp creation in the examples will be affected by the local timezone

_FUNC_(TIMESTAMP('1970-01-01 00:00:01'))

Now I update the example and add the timezone in the timestamp literal. We can test examples.

Sounds better.

SparkQA · 2020-12-02T16:24:35Z

Test build #132053 has finished for PR 30566 at commit 82ba97b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-02T16:56:59Z

Test build #132061 has finished for PR 30566 at commit 270aa95.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-12-02T17:09:16Z

retest this please

dongjoon-hyun · 2020-12-02T17:41:00Z

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

+-- UNIX_SECONDS, UNIX_MILLISECONDS and UNIX_MICROSECONDS
+select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_SECONDS(null);
+select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_MILLIS(null);
+select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_MICROS(null);


Could you add timezone into timestamp literals?

Thanks, updated.

dongjoon-hyun · 2020-12-02T17:50:31Z

sql/core/src/test/resources/sql-functions/sql-expression-schema.md

 | org.apache.spark.sql.catalyst.expressions.Unhex | unhex | SELECT decode(unhex('537061726B2053514C'), 'UTF-8') | struct<decode(unhex(537061726B2053514C), UTF-8):string> |
+| org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_micros(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_millis(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_seconds(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |


Thanks, updated.

SparkQA · 2020-12-02T17:54:39Z

Test build #132066 has finished for PR 30566 at commit 46c8651.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-02T18:04:26Z

Test build #132062 has finished for PR 30566 at commit 46c8651.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-12-02T18:18:13Z

Thank you for update, @gengliangwang !

SparkQA · 2020-12-02T18:24:09Z

Test build #132056 has finished for PR 30566 at commit f5e4b53.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-02T20:27:25Z

Test build #132067 has finished for PR 30566 at commit 025fdd2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @gengliangwang and @cloud-fan .
Merged to master for Apache Spark 3.1.0.

add new functions

80d06c3

gengliangwang requested a review from cloud-fan December 2, 2020 08:08

github-actions bot added the SQL label Dec 2, 2020

revise test case name

0b31348

update datetime-legacy.sql.out

35ef5d2

cloud-fan reviewed Dec 2, 2020

View reviewed changes

address comments

45495c7

update

82ba97b

cloud-fan approved these changes Dec 2, 2020

View reviewed changes

fix test failure

f5e4b53

cloud-fan reviewed Dec 2, 2020

View reviewed changes

gengliangwang added 2 commits December 3, 2020 00:18

fix

270aa95

fix

46c8651

dongjoon-hyun reviewed Dec 2, 2020

View reviewed changes

address comment

025fdd2

dongjoon-hyun approved these changes Dec 2, 2020

View reviewed changes

dongjoon-hyun closed this in b76c6b7 Dec 2, 2020

gengliangwang mentioned this pull request Dec 3, 2020

[SPARK-33500][SQL] Support field "EPOCH" in datetime function extract/date_part #30445

Closed

[SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS #30566

[SPARK-33627][SQL] Add new function UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS #30566

Uh oh!

Conversation

gengliangwang commented Dec 2, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gengliangwang commented Dec 2, 2020

Uh oh!

maropu commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

gengliangwang commented Dec 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

gengliangwang commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

cloud-fan commented Dec 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

dongjoon-hyun commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

SparkQA commented Dec 2, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan Dec 2, 2020 •

edited

Loading