Skip to content

[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype #50607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

the-sakthi
Copy link
Member

@the-sakthi the-sakthi commented Apr 16, 2025

What changes were proposed in this pull request?

  • Added a new built-in function time_trunc(unit, expr) that returns a TIME value truncated to the specified unit.
  • Allowed input for expr to be either a TIME type or a string that can be cast to TIME.
  • Supported truncation units are HOUR, MINUTE, SECOND, MILLISECOND, and MICROSECOND.
  • Handles both foldable and non-foldable inputs

Why are the changes needed?

  • Spark currently lacks a built-in function for truncating TIME values in a similar way as truncTimestamp.

Does this PR introduce any user-facing change?

Yes. A new built-in function time_trunc is added. Users can call the function to truncate TIME values to one of the above mentioned supported units.

How was this patch tested?

By running newly added UTs:

$ build/sbt "test:testOnly *TimeExpressionsSuite.scala"

By manual tests:

# Happy test cases
scala> spark.sql("SELECT time_trunc('HOUR', '09:32:05.123456');").show()
+---------------------------------+
|time_trunc(HOUR, 09:32:05.123456)|
+---------------------------------+
|                         09:00:00|
+---------------------------------+

scala> spark.sql("SELECT time_trunc('MINUTE', TIME'09:32:05.123456');").show()
+------------------------------------------+
|time_trunc(MINUTE, TIME '09:32:05.123456')|
+------------------------------------------+
|                                  09:32:00|
+------------------------------------------+

scala> spark.sql("SELECT time_trunc('second', '09:32:05.123456');").show()
+-----------------------------------+
|time_trunc(second, 09:32:05.123456)|
+-----------------------------------+
|                           09:32:05|
+-----------------------------------+

scala> spark.sql("SELECT time_trunc(concat('milli','second'), '09:32:05.123456');").show()
+--------------------------------------------------+
|time_trunc(concat(milli, second), 09:32:05.123456)|
+--------------------------------------------------+
|                                      09:32:05.123|
+--------------------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.123456)|
+----------------------------------------+
|                         09:32:05.123456|
+----------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.1234');").show()
+--------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.1234)|
+--------------------------------------+
|                         09:32:05.1234|
+--------------------------------------+

# Invalid inputs
scala> spark.sql("SELECT time_trunc('MS', '09:32:05.123456');").show()
+-------------------------------+
|time_trunc(MS, 09:32:05.123456)|
+-------------------------------+
|                           NULL|
+-------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '29:32:05.123456');").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '29:32:05.123456' of the type "STRING" cannot be cast to "TIME(6)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
SELECT time_trunc('MICROSECOND', '29:32:05.123456');

# unfoldable inputs
scala> val df = Seq(
     |   ("HOUR",       "09:32:05.123456"),
     |   ("MINUTE",     "10:20:15.123456"),
     |   ("second",     "11:59:59.999999"),
     |   ("MILLISECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123456"),
     |   ("MS","00:00:00.123456"),
     |   ("NULL","00:00:00.123456")
     | ).toDF("unitcol", "timecol")
val df: org.apache.spark.sql.DataFrame = [unitcol: string, timecol: string]

scala> val timeDf = df.selectExpr("unitcol", "CAST(timecol AS TIME(6)) as timeval")
val timeDf: org.apache.spark.sql.DataFrame = [unitcol: string, timeval: time(6)]

scala> timeDf.createOrReplaceTempView("tmp")

scala> spark.sql("""
     |   SELECT
     |     unitcol,
     |     timeval,
     |     time_trunc(unitcol, timeval) as truncated
     |   FROM tmp
     | """).show(false)
+-----------+---------------+---------------+
|unitcol    |timeval        |truncated      |
+-----------+---------------+---------------+
|HOUR       |09:32:05.123456|09:00:00       |
|MINUTE     |10:20:15.123456|10:20:00       |
|second     |11:59:59.999999|11:59:59       |
|MILLISECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123456|00:00:00.123456|
|MS         |00:00:00.123456|NULL           |
|NULL       |00:00:00.123456|NULL           |
+-----------+---------------+---------------+

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Apr 16, 2025
@the-sakthi
Copy link
Member Author

the-sakthi commented Apr 16, 2025

@MaxGekk While I am trying to convert this into a RuntimeReplaceable version and add the UTs for this, would appreciate any feedbacks from you on this, meanwhile!

@the-sakthi the-sakthi changed the title [WIP][SPARK-51554][SQL] Add the time_trunc() function for TIME datatype [SPARK-51554][SQL] Add the time_trunc() function for TIME datatype Apr 29, 2025
@the-sakthi
Copy link
Member Author

@MaxGekk Please do let me know your thoughts on this one! Have updated the revision.

@the-sakthi
Copy link
Member Author

@MaxGekk Any chance I can get a review here, please?

@the-sakthi
Copy link
Member Author

Hey @MaxGekk , could I get a review here, please? Looking forward to the comments.

@@ -351,7 +351,7 @@ case class SecondsOfTime(child: Expression)
)

override def inputTypes: Seq[AbstractDataType] =
Seq(TypeCollection(TimeType.MIN_PRECISION to TimeType.MAX_PRECISION map TimeType.apply: _*))
Seq(TypeCollection(TimeType.MIN_PRECISION to TimeType.MAX_PRECISION map TimeType: _*))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, revert it back otherwise it causes a compilation warning, see #50692

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will do!

if (level < MIN_LEVEL_OF_TIME_TRUNC) { // unknown unit
-1L
} else {
val out = truncTime(micros, level) // may still return -1L on error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of splitting the function? Is truncTime used somewhere else?

override def inputTypes: Seq[AbstractDataType] =
Seq(
StringType,
TypeCollection(TimeType.MIN_PRECISION to TimeType.MAX_PRECISION map TimeType: _*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you double check that it won't produce warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants