-
Notifications
You must be signed in to change notification settings - Fork 285
Description
What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark next_day function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The NextDay expression returns the first date after a given start date that falls on a specified day of the week. It advances from the start date to find the next occurrence of the target day, excluding the start date itself even if it matches the target day of week.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
next_day(start_date, day_of_week)// DataFrame API
col("date_column").next_day("Monday")
next_day(col("start_date"), col("day_name"))Arguments:
| Argument | Type | Description |
|---|---|---|
| startDate | DateType | The starting date from which to find the next occurrence |
| dayOfWeek | StringType (with collation support) | The target day of week as a string (e.g., "Monday", "Tue") |
| failOnError | Boolean | Internal parameter controlling ANSI mode behavior for invalid inputs |
Return Type: Returns DateType - an integer representing days since epoch (1970-01-01).
Supported Data Types:
- Input: DateType for start date, StringType with collation support for day of week
- Output: DateType
- Trimming: Supports trim collation for the day of week string parameter
Edge Cases:
- Null handling: Returns null if either input is null (null intolerant behavior)
- Invalid day names: Throws
SparkIllegalArgumentExceptionin ANSI mode, returns null otherwise - Case sensitivity: Day of week parsing follows
DateTimeUtilscase handling rules - Abbreviations: Supports abbreviated day names (implementation dependent on
DateTimeUtils) - Same day exclusion: Never returns the start date itself, always advances to next occurrence
Examples:
-- Find next Monday after January 1st, 2023
SELECT next_day('2023-01-01', 'Monday');
-- Returns: 2023-01-02
-- Using with column references
SELECT order_date, next_day(order_date, 'Friday') as next_friday
FROM orders;
-- Next Tuesday after current date
SELECT next_day(current_date(), 'Tue');// DataFrame API examples
import org.apache.spark.sql.functions._
// Find next Monday for each date
df.select(col("start_date"), next_day(col("start_date"), lit("Monday")))
// Dynamic day of week from another column
df.select(next_day(col("event_date"), col("target_day")))
// Using string interpolation
df.withColumn("next_sunday", next_day($"date_col", "Sunday"))Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/ - Register: Add to appropriate map in
QueryPlanSerde.scala - Protobuf: Add message type in
native/proto/src/proto/expr.protoif needed - Rust: Implement in
native/spark-expr/src/(check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.NextDay
Related:
date_add- Add days to a datedate_sub- Subtract days from a datedayofweek- Extract day of week from datelast_day- Get last day of monthDateTimeUtils- Underlying utility class for date operations
This issue was auto-generated from Spark reference documentation.