-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13549] [SQL] Refactor the Optimizer Rule CollapseProject #11427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #52177 has finished for PR 11427 at commit
|
|
LGTM |
| projectList1.exists(_.collect { | ||
| case a: Attribute if aliasMap.contains(a) => aliasMap(a).child | ||
| }.exists(!_.deterministic)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further refactored your code a little bit:
def collectAliases(projectList: Seq[NamedExpression]): AttributeMap[Alias] = {
AttributeMap(projectList.collect {
case a: Alias => a.toAttribute -> a
})
}
def haveCommonNonDeterministicOutput(
upper: Seq[NamedExpression], lower: Seq[NamedExpression]): Boolean = {
val aliases = collectAliases(lower)
upper.exists(_.collect {
case a: Attribute if aliases.contains(a) => aliases(a).child
}).exists(!_.deterministic)
}
def buildCleanedProjectList(
upper: Seq[NamedExpression],
lower: Seq[NamedExpression]): Seq[NamedExpression] = {
val aliases = collectAliases(lower)
val rewrittenUpper = upper.map(_.transform {
case a: Attribute => aliases.getOrElse(a, a)
})
rewrittenUpper.map { p =>
CleanupAliases.trimNonTopLevelAliases(p).asInstanceOf[NamedExpression]
}
}And those inline comments need some rewording as they are now moved to different contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! will do
|
Sorry for the late review. In general it looks good. Left some comments for better readability and testing. |
|
This is definitely a low priority issue. Thank you for your review! Will do the change based on your suggestions. @liancheng |
|
Test build #53934 has finished for PR 11427 at commit
|
|
Thanks! Merging to master. |
|
Thank you! @liancheng Now, this rule supports collapsing two continuous After rethinking it, maybe we also can collapse one upper |
|
Let me post an example here: SELECT sum(b) FROM parquet_t2 group by a, bThe current optimized plan is like Ideally, we can directly remove the Project. |
|
But, their physical plan is still the same. I guess, we do not need to do it. : ) |
|
@gatorsmile Yea, whole-stage codegen makes this kind of optimization not that appealing. |
|
I see. Thanks! @liancheng Let me find something else in Optimizer. Will do expression folding this weekend. : ) |
What changes were proposed in this pull request?
The PR #10541 changed the rule
CollapseProjectby enabling collapsingProjectintoAggregate. It leaves a to-do item to remove the duplicate code. This PR is to finish this to-do item. Also added a test case for covering this change.How was this patch tested?
Added a new test case.
@liancheng Could you check if the code refactoring is fine? Thanks!