-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53762][SQL] Add date and time conversions simplifier rule to optimizer #52493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-53762][SQL] Add date and time conversions simplifier rule to optimizer #52493
Conversation
*/ | ||
object SimplifyDateTimeConversions extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( | ||
_.containsPattern(DATETIME), ruleId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in dcf49a8.
def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( | ||
_.containsPattern(DATETIME), ruleId) { | ||
case q: LogicalPlan => q.transformExpressionsUpWithPruning( | ||
_.containsPattern(DATETIME), ruleId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. nit. Indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in dcf49a8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor comments.
TimestampType, | ||
_, | ||
timeZoneId2, | ||
_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. Maybe, the following is better?
case DateFormatClass(
GetTimestamp(
- e @ DateFormatClass(
- _,
- pattern,
- timeZoneId),
+ e @ DateFormatClass(_, pattern, timeZoneId),
pattern2,
TimestampType,
_,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in dcf49a8.
pattern2, | ||
timeZoneId2), | ||
pattern3, | ||
TimestampType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Maybe?
case GetTimestamp(
DateFormatClass(
- e @ GetTimestamp(
- _,
- pattern,
- TimestampType,
- _,
- timeZoneId,
- _),
+ e @ GetTimestamp(_, pattern, TimestampType, _, timeZoneId, _),
pattern2,
timeZoneId2),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks nicer. Fixed in dcf49a8.
"org.apache.spark.sql.catalyst.optimizer.RewriteAsOfJoin" :: | ||
"org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison" :: | ||
"org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions" :: | ||
"org.apache.spark.sql.catalyst.optimizer.SimplifyDateTimeConversions" :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this order is not strict, can we move this new rule after SimplifyConditionals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge your #52495 first then I will rebase this PR and move the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 018055d.
} | ||
|
||
/** | ||
* Removes date and time related functions that are unnecessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can elaborate more what is the definition of uncecessary
as of now? It would be great an itemized list because we can add more when we improve SimplifyDateTimeConversions
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to cases one by one in dcf49a8.
@MaxGekk , can you please take a look at these simplifications if you have some time? |
Since I merged #52495, could you rebase this PR to the master branch, please? |
dcf49a8
to
018055d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @peter-toth .
What changes were proposed in this pull request?
This PR add a new rule to the optimizer, that focuses on date and time conversion functions and tries to eliminate the unnecessary ones.
Why are the changes needed?
Date and time conversions are not cheap so eliminating some of them can bring considerable performance improvement.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new UT.
Was this patch authored or co-authored using generative AI tooling?
No.