-
Notifications
You must be signed in to change notification settings - Fork 285
Description
What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark last_day function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The LastDay expression calculates the last day of the month for a given date. It takes a date as input and returns a new date representing the final day of that same month and year.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
last_day(date_expr)// DataFrame API
import org.apache.spark.sql.functions.last_day
df.select(last_day($"date_column"))Arguments:
| Argument | Type | Description |
|---|---|---|
| startDate | DateType | The input date for which to find the last day of the month |
Return Type: DateType - Returns a date representing the last day of the month containing the input date.
Supported Data Types:
- DateType (input will be implicitly cast to DateType if compatible)
Edge Cases:
-
Null handling: Returns null if the input date is null (null intolerant behavior)
-
Leap years: Correctly handles February in leap years (returns February 29) vs non-leap years (returns February 28)
-
Month boundaries: Properly handles months with different numbers of days (30, 31, 28, or 29 days)
-
Date range limits: Behavior depends on the underlying
DateTimeUtils.getLastDayOfMonth()implementation for edge dates
Examples:
-- Get the last day of January 2009
SELECT last_day('2009-01-12');
-- Result: 2009-01-31
-- Get the last day for multiple dates
SELECT last_day('2020-02-15'), last_day('2021-02-15');
-- Result: 2020-02-29, 2021-02-28 (leap year vs non-leap year)// DataFrame API usage
import org.apache.spark.sql.functions.last_day
// Calculate last day of month for a date column
df.select(last_day($"order_date").alias("month_end"))
// Use in filtering
df.filter(last_day($"date_col") === lit("2023-12-31"))Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/ - Register: Add to appropriate map in
QueryPlanSerde.scala - Protobuf: Add message type in
native/proto/src/proto/expr.protoif needed - Rust: Implement in
native/spark-expr/src/(check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.LastDay
Related:
date_add()- Add days to a datedate_sub()- Subtract days from a datetrunc()- Truncate date to specified unitadd_months()- Add months to a date
This issue was auto-generated from Spark reference documentation.