Skip to content

[Feature] Support Spark expression: next_day #3092

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark next_day function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The NextDay expression returns the first date after a given start date that falls on a specified day of the week. It advances from the start date to find the next occurrence of the target day, excluding the start date itself even if it matches the target day of week.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

next_day(start_date, day_of_week)
// DataFrame API
col("date_column").next_day("Monday")
next_day(col("start_date"), col("day_name"))

Arguments:

Argument Type Description
startDate DateType The starting date from which to find the next occurrence
dayOfWeek StringType (with collation support) The target day of week as a string (e.g., "Monday", "Tue")
failOnError Boolean Internal parameter controlling ANSI mode behavior for invalid inputs

Return Type: Returns DateType - an integer representing days since epoch (1970-01-01).

Supported Data Types:

  • Input: DateType for start date, StringType with collation support for day of week
  • Output: DateType
  • Trimming: Supports trim collation for the day of week string parameter

Edge Cases:

  • Null handling: Returns null if either input is null (null intolerant behavior)
  • Invalid day names: Throws SparkIllegalArgumentException in ANSI mode, returns null otherwise
  • Case sensitivity: Day of week parsing follows DateTimeUtils case handling rules
  • Abbreviations: Supports abbreviated day names (implementation dependent on DateTimeUtils)
  • Same day exclusion: Never returns the start date itself, always advances to next occurrence

Examples:

-- Find next Monday after January 1st, 2023
SELECT next_day('2023-01-01', 'Monday');
-- Returns: 2023-01-02

-- Using with column references
SELECT order_date, next_day(order_date, 'Friday') as next_friday
FROM orders;

-- Next Tuesday after current date
SELECT next_day(current_date(), 'Tue');
// DataFrame API examples
import org.apache.spark.sql.functions._

// Find next Monday for each date
df.select(col("start_date"), next_day(col("start_date"), lit("Monday")))

// Dynamic day of week from another column  
df.select(next_day(col("event_date"), col("target_day")))

// Using string interpolation
df.withColumn("next_sunday", next_day($"date_col", "Sunday"))

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.NextDay

Related:

  • date_add - Add days to a date
  • date_sub - Subtract days from a date
  • dayofweek - Extract day of week from date
  • last_day - Get last day of month
  • DateTimeUtils - Underlying utility class for date operations

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions