Skip to content

Conversation

peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

This PR add a new rule to the optimizer, that focuses on date and time conversion functions and tries to eliminate the unnecessary ones.

Why are the changes needed?

Date and time conversions are not cheap so eliminating some of them can bring considerable performance improvement.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Sep 30, 2025
*/
object SimplifyDateTimeConversions extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
_.containsPattern(DATETIME), ruleId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in dcf49a8.

def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
_.containsPattern(DATETIME), ruleId) {
case q: LogicalPlan => q.transformExpressionsUpWithPruning(
_.containsPattern(DATETIME), ruleId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. nit. Indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dcf49a8.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments.

TimestampType,
_,
timeZoneId2,
_),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. Maybe, the following is better?

       case DateFormatClass(
           GetTimestamp(
-            e @ DateFormatClass(
-              _,
-              pattern,
-              timeZoneId),
+            e @ DateFormatClass(_, pattern, timeZoneId),
             pattern2,
             TimestampType,
             _,

Copy link
Contributor Author

@peter-toth peter-toth Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dcf49a8.

pattern2,
timeZoneId2),
pattern3,
TimestampType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Maybe?

       case GetTimestamp(
           DateFormatClass(
-            e @ GetTimestamp(
-              _,
-              pattern,
-              TimestampType,
-              _,
-              timeZoneId,
-              _),
+            e @ GetTimestamp(_, pattern, TimestampType, _, timeZoneId, _),
             pattern2,
             timeZoneId2),

Copy link
Contributor Author

@peter-toth peter-toth Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks nicer. Fixed in dcf49a8.

"org.apache.spark.sql.catalyst.optimizer.RewriteAsOfJoin" ::
"org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison" ::
"org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions" ::
"org.apache.spark.sql.catalyst.optimizer.SimplifyDateTimeConversions" ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this order is not strict, can we move this new rule after SimplifyConditionals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge your #52495 first then I will rebase this PR and move the rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 018055d.

}

/**
* Removes date and time related functions that are unnecessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can elaborate more what is the definition of uncecessary as of now? It would be great an itemized list because we can add more when we improve SimplifyDateTimeConversions in the future.

Copy link
Contributor Author

@peter-toth peter-toth Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to cases one by one in dcf49a8.

@peter-toth
Copy link
Contributor Author

@MaxGekk , can you please take a look at these simplifications if you have some time?

@dongjoon-hyun
Copy link
Member

Since I merged #52495, could you rebase this PR to the master branch, please?

@peter-toth peter-toth force-pushed the SPARK-53762-add-simplifydatetimeconversions-rule branch from dcf49a8 to 018055d Compare October 1, 2025 15:19
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @peter-toth .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants