-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42815][SQL] Subexpression elimination support shortcut expression #40446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @viirya @cloud-fan thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| buildConf("spark.sql.subexpressionElimination.shortcut.enabled") | |
| buildConf("spark.sql.subexpressionElimination.skipForShortcutExpr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only in conditional expression? where or(a, and(b, b)) has the same problem, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, CSE only supports project and aggregate. Predicate seems not likely appear without conditional expr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not likely, but possible, select or(a, and(b, b)) is also valid.
BTW, I think we should not enable this new change by default, as it may lead to perf regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, addressed !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is very unclear. Can you add some more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, thank you @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if other expression (e.g. or.right) is not added into recursing list, how can we look into if it needs to be eliminated? If or.left is false, it will be evaluated, isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is why we add a new config with disabled by default. We can not decide which subexpression would be evaluated before running. When enable this config, it assumes that the left child is a shotcut, then the right child can be skipped whatever it contains common subexpression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea is to make CSE more dynamic: only evaluate it if its first appearance needs to be evaluated. It can handle ConditionalExpression as well but is much harder to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a kind of lazy evaluation. I remember before that pr attempts #32977 , and there is some issues @viirya memtioned #32977 (review).
It seems the main issue is that, the method will go to large if we make each common subexpression evaluation lazy. Something like:
def common_subexpression_1() {
if (isnull) {
// evaluate
} else {
// return exists value
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| shortcut: Boolean = SQLConf.get.subexpressionEliminationSkipForShotcutExpr) { | |
| skipShortcut: Boolean = SQLConf.get.subexpressionEliminationSkipForShotcutExpr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we match And/Or here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not need, the next round will cover it. I added some tests to confirm that And is not the root node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are correct but I'm confused about why the code works... if And is a root expression, we blindly take all its children here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see it. The actually root expression is If which is the ConditionalExpression. Let me upadte it and the outdate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipShortCut means we need to handle the shortcut expressions to skip CSE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe skipInShortcut is a clearer name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to skipForShortcut which aligns with config name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
childrenToRecurse is recursive, so skipForShortcut doesn't need to be recursive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea it is the fact, but after second thought the updateExprTree has side effect updateExprInMap during recursion. If we decide to skip for shortcut, is it better to return the final valid expression in one shot ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about inconsistency. childrenToRecurse is not recursive either and it seems messy if we only make skipForShortcut recursive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, it should consistent with childrenToRecurse
|
thanks, merging to master! |
What changes were proposed in this pull request?
Add a new config to shortcut subexpression elimination for expression
and,or.The subexpression may not need to eval even if it appears more than once.
e.g.,
if(or(a, and(b, b))), the expressionbwould be skipped ifais true.Why are the changes needed?
avoid eval unnecessary expression.
Does this PR introduce any user-facing change?
no
How was this patch tested?
add test