Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Mar 15, 2023

What changes were proposed in this pull request?

Add a new config to shortcut subexpression elimination for expression and, or.

The subexpression may not need to eval even if it appears more than once.
e.g., if(or(a, and(b, b))), the expression b would be skipped if a is true.

Why are the changes needed?

avoid eval unnecessary expression.

Does this PR introduce any user-facing change?

no

How was this patch tested?

add test

@ulysses-you
Copy link
Contributor Author

cc @viirya @cloud-fan thank you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buildConf("spark.sql.subexpressionElimination.shortcut.enabled")
buildConf("spark.sql.subexpressionElimination.skipForShortcutExpr")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only in conditional expression? where or(a, and(b, b)) has the same problem, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, CSE only supports project and aggregate. Predicate seems not likely appear without conditional expr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not likely, but possible, select or(a, and(b, b)) is also valid.

BTW, I think we should not enable this new change by default, as it may lead to perf regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, addressed !

@ulysses-you ulysses-you changed the title [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression [SPARK-42815][SQL] Subexpression elimination support shortcut expression Mar 16, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is very unclear. Can you add some more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thank you @viirya

Comment on lines 141 to 143
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if other expression (e.g. or.right) is not added into recursing list, how can we look into if it needs to be eliminated? If or.left is false, it will be evaluated, isn't?

Copy link
Contributor Author

@ulysses-you ulysses-you Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is why we add a new config with disabled by default. We can not decide which subexpression would be evaluated before running. When enable this config, it assumes that the left child is a shotcut, then the right child can be skipped whatever it contains common subexpression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is to make CSE more dynamic: only evaluate it if its first appearance needs to be evaluated. It can handle ConditionalExpression as well but is much harder to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a kind of lazy evaluation. I remember before that pr attempts #32977 , and there is some issues @viirya memtioned #32977 (review).
It seems the main issue is that, the method will go to large if we make each common subexpression evaluation lazy. Something like:

def common_subexpression_1() {
  if (isnull) {
    // evaluate
  } else {
    // return exists value
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shortcut: Boolean = SQLConf.get.subexpressionEliminationSkipForShotcutExpr) {
skipShortcut: Boolean = SQLConf.get.subexpressionEliminationSkipForShotcutExpr) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we match And/Or here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not need, the next round will cover it. I added some tests to confirm that And is not the root node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are correct but I'm confused about why the code works... if And is a root expression, we blindly take all its children here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see it. The actually root expression is If which is the ConditionalExpression. Let me upadte it and the outdate test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipShortCut means we need to handle the shortcut expressions to skip CSE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe skipInShortcut is a clearer name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to skipForShortcut which aligns with config name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

childrenToRecurse is recursive, so skipForShortcut doesn't need to be recursive.

Copy link
Contributor Author

@ulysses-you ulysses-you Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it is the fact, but after second thought the updateExprTree has side effect updateExprInMap during recursion. If we decide to skip for shortcut, is it better to return the final valid expression in one shot ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about inconsistency. childrenToRecurse is not recursive either and it seems messy if we only make skipForShortcut recursive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, it should consistent with childrenToRecurse

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6f7403b Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants