Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

As #28534 adds functions from BigQuery for converting numbers to timestamp, this PR is to add functions UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS for converting timestamp to numbers.

Why are the changes needed?

  1. Symmetry of the conversion functions
  2. Casting timestamp type to numeric types is disallowed in ANSI mode, we should provide functions for users to complete the conversion.

Does this PR introduce any user-facing change?

3 new functions UNIX_SECONDS, UNIX_MILLIS and UNIX_MICROS for converting timestamp to long type.

How was this patch tested?

Unit tests.

@gengliangwang
Copy link
Member Author

I will have follow-up PR for adding DATE_FROM_UNIX_DATE/UNIX_DATE and update the error message of ANSI casting.

@maropu
Copy link
Member

maropu commented Dec 2, 2020

Probably, you need to add these new functions in the ignore set:

// One of examples shows getting the current timestamp
"org.apache.spark.sql.catalyst.expressions.UnixTimestamp",
"org.apache.spark.sql.catalyst.expressions.CurrentDate",
"org.apache.spark.sql.catalyst.expressions.CurrentTimestamp",
"org.apache.spark.sql.catalyst.expressions.CurrentTimeZone",
"org.apache.spark.sql.catalyst.expressions.Now",

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132031 has finished for PR 30566 at commit 0b31348.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Probably, you need to add these new functions in the ignore set:

@maropu Sorry but can you explain the reason? I am not familiar with this.


// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't truncate.

checkResult(Int.MinValue.toLong - 100)
}

test("UNIX_SECONDS") {
Copy link
Contributor

@cloud-fan cloud-fan Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also test null and negative input (that truncates)

checkEvaluation(UnixMicros(timestamp), 1000000L)
// Truncates higher levels of precision
val timestampWithNanos = new Timestamp(1000L)
timestampWithNanos.setNanos(999999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncation happens in Literal.apply, not UnixMicros

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use Instant to test these functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's just use timestampWithNanos.setNanos(1000) since there won't be truncation here.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132042 has finished for PR 30566 at commit 35ef5d2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132051 has finished for PR 30566 at commit 45495c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"org.apache.spark.sql.catalyst.expressions.Now",
"org.apache.spark.sql.catalyst.expressions.UnixSeconds",
"org.apache.spark.sql.catalyst.expressions.UnixMillis",
"org.apache.spark.sql.catalyst.expressions.UnixMicros",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we run the example SQL for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, the timestamp creation in the examples will be affected by the local timezone

_FUNC_(TIMESTAMP('1970-01-01 00:00:01'))

Now I update the example and add the timezone in the timestamp literal. We can test examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds better.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132053 has finished for PR 30566 at commit 82ba97b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132061 has finished for PR 30566 at commit 270aa95.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

-- UNIX_SECONDS, UNIX_MILLISECONDS and UNIX_MICROSECONDS
select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_SECONDS(null);
select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_MILLIS(null);
select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08')), UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.999999')), UNIX_MICROS(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add timezone into timestamp literals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

| org.apache.spark.sql.catalyst.expressions.Unhex | unhex | SELECT decode(unhex('537061726B2053514C'), 'UTF-8') | struct<decode(unhex(537061726B2053514C), UTF-8):string> |
| org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_micros(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
| org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_millis(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
| org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_seconds(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please regenerate this, @gengliangwang .

-| org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_micros(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
-| org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_millis(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
-| org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01')) | struct<unix_seconds(CAST(1970-01-01 00:00:01 AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixMicros | unix_micros | SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z')) | struct<unix_micros(CAST(1970-01-01 00:00:01Z AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixMillis | unix_millis | SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z')) | struct<unix_millis(CAST(1970-01-01 00:00:01Z AS TIMESTAMP)):bigint> |
+| org.apache.spark.sql.catalyst.expressions.UnixSeconds | unix_seconds | SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z')) | struct<unix_seconds(CAST(1970-01-01 00:00:01Z AS TIMESTAMP)):bigint> |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132066 has finished for PR 30566 at commit 46c8651.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132062 has finished for PR 30566 at commit 46c8651.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for update, @gengliangwang !

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132056 has finished for PR 30566 at commit f5e4b53.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2020

Test build #132067 has finished for PR 30566 at commit 025fdd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @gengliangwang and @cloud-fan .
Merged to master for Apache Spark 3.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants