Skip to content

[Feature] Support Spark expression: last_day #3090

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark last_day function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The LastDay expression calculates the last day of the month for a given date. It takes a date as input and returns a new date representing the final day of that same month and year.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

last_day(date_expr)
// DataFrame API
import org.apache.spark.sql.functions.last_day
df.select(last_day($"date_column"))

Arguments:

Argument Type Description
startDate DateType The input date for which to find the last day of the month

Return Type: DateType - Returns a date representing the last day of the month containing the input date.

Supported Data Types:

  • DateType (input will be implicitly cast to DateType if compatible)

Edge Cases:

  • Null handling: Returns null if the input date is null (null intolerant behavior)

  • Leap years: Correctly handles February in leap years (returns February 29) vs non-leap years (returns February 28)

  • Month boundaries: Properly handles months with different numbers of days (30, 31, 28, or 29 days)

  • Date range limits: Behavior depends on the underlying DateTimeUtils.getLastDayOfMonth() implementation for edge dates

Examples:

-- Get the last day of January 2009
SELECT last_day('2009-01-12');
-- Result: 2009-01-31

-- Get the last day for multiple dates
SELECT last_day('2020-02-15'), last_day('2021-02-15');
-- Result: 2020-02-29, 2021-02-28 (leap year vs non-leap year)
// DataFrame API usage
import org.apache.spark.sql.functions.last_day

// Calculate last day of month for a date column
df.select(last_day($"order_date").alias("month_end"))

// Use in filtering
df.filter(last_day($"date_col") === lit("2023-12-31"))

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.LastDay

Related:

  • date_add() - Add days to a date
  • date_sub() - Subtract days from a date
  • trunc() - Truncate date to specified unit
  • add_months() - Add months to a date

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions