-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28478][SQL] Remove redundant null checks #27231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -434,6 +434,27 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |
| case _ => false | ||
| } | ||
|
|
||
| /** | ||
| * Condition for redundant null check based on intolerant expressions. | ||
| * @param ifNullExpr expression that takes place if checkedExpr is null | ||
| * @param ifNotNullExpr expression that takes place if checkedExpr is not null | ||
| * @param checkedExpr expression that is checked for null value | ||
| */ | ||
| private def isRedundantNullCheck( | ||
| ifNullExpr: Expression, | ||
| ifNotNullExpr: Expression, | ||
| checkedExpr: Expression): Boolean = { | ||
| val isNullIntolerant = ifNotNullExpr.find { x => | ||
| !x.isInstanceOf[NullIntolerant] && x.find(e => e.semanticEquals(checkedExpr)).nonEmpty | ||
| }.isEmpty | ||
|
|
||
| isNullIntolerant && { | ||
| (ifNullExpr.semanticEquals(checkedExpr) || | ||
| (ifNullExpr.foldable && ifNullExpr.eval() == null)) && | ||
| ifNotNullExpr.find(x => x.semanticEquals(checkedExpr)).nonEmpty | ||
| } | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: How about this style?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first condition
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The second condition
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you generalize the last condition more? e.g., how about the case,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, that should be possible. |
||
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case q: LogicalPlan => q transformExpressionsUp { | ||
| case If(TrueLiteral, trueValue, _) => trueValue | ||
|
|
@@ -442,6 +463,15 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |
| case If(cond, trueValue, falseValue) | ||
| if cond.deterministic && trueValue.semanticEquals(falseValue) => trueValue | ||
|
|
||
| case i @ If(cond, trueValue, falseValue) => cond match { | ||
| // If the null-check is redundant, remove it | ||
| case IsNull(child) | ||
| if isRedundantNullCheck(trueValue, falseValue, child) => falseValue | ||
| case IsNotNull(child) | ||
| if isRedundantNullCheck(falseValue, trueValue, child) => trueValue | ||
| case _ => i | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about this format?;
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you add the inner pattern-matching (
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. You are right, i do not need the inner pattern match, i will fix that. |
||
|
|
||
| case e @ CaseWhen(branches, elseValue) if branches.exists(x => falseOrNullLiteral(x._1)) => | ||
| // If there are branches that are always false, remove them. | ||
| // If there are no more branches left, just use the else value. | ||
|
|
@@ -483,6 +513,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |
| } else { | ||
| e.copy(branches = branches.take(i).map(branch => (branch._1, elseValue))) | ||
| } | ||
|
|
||
| // remove redundant null checks for CaseWhen with one branch | ||
| case CaseWhen(Seq((IsNotNull(child), trueValue)), Some(falseValue)) | ||
| if isRedundantNullCheck(falseValue, trueValue, child) => trueValue | ||
| case CaseWhen(Seq((IsNull(child), trueValue)), Some(falseValue)) | ||
| if isRedundantNullCheck(trueValue, falseValue, child) => falseValue | ||
| case CaseWhen(Seq((IsNotNull(child), trueValue)), None) | ||
| if isRedundantNullCheck(Literal.create(null, child.dataType), trueValue, child) => trueValue | ||
| case e @ CaseWhen(Seq((IsNull(child), trueValue)), None) => | ||
| val nullValue = Literal.create(null, child.dataType) | ||
| if (isRedundantNullCheck(trueValue, nullValue, child)) { | ||
| nullValue | ||
| } else { | ||
| e | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright |
||
| } | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,12 +18,13 @@ | |
| package org.apache.spark.sql.catalyst.optimizer | ||
|
|
||
| import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute | ||
| import org.apache.spark.sql.catalyst.dsl.expressions._ | ||
| import org.apache.spark.sql.catalyst.dsl.plans._ | ||
| import org.apache.spark.sql.catalyst.expressions._ | ||
| import org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, TrueLiteral} | ||
| import org.apache.spark.sql.catalyst.plans.PlanTest | ||
| import org.apache.spark.sql.catalyst.plans.logical._ | ||
| import org.apache.spark.sql.catalyst.rules._ | ||
| import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan, Project} | ||
| import org.apache.spark.sql.catalyst.rules.RuleExecutor | ||
| import org.apache.spark.sql.types.{IntegerType, NullType} | ||
|
|
||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: You need to avoid unnecessary changes like this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok |
||
|
|
@@ -35,8 +36,8 @@ class SimplifyConditionalSuite extends PlanTest with PredicateHelper { | |
| } | ||
|
|
||
| protected def assertEquivalent(e1: Expression, e2: Expression): Unit = { | ||
| val correctAnswer = Project(Alias(e2, "out")() :: Nil, OneRowRelation()).analyze | ||
| val actual = Optimize.execute(Project(Alias(e1, "out")() :: Nil, OneRowRelation()).analyze) | ||
| val correctAnswer = Project(Alias(e2, "out")() :: Nil, LocalRelation('a.int)).analyze | ||
| val actual = Optimize.execute(Project(Alias(e1, "out")() :: Nil, LocalRelation('a.int)).analyze) | ||
| comparePlans(actual, correctAnswer) | ||
| } | ||
|
|
||
|
|
@@ -45,7 +46,13 @@ class SimplifyConditionalSuite extends PlanTest with PredicateHelper { | |
| private val unreachableBranch = (FalseLiteral, Literal(20)) | ||
| private val nullBranch = (Literal.create(null, NullType), Literal(30)) | ||
|
|
||
| val isNotNullCond = IsNotNull(UnresolvedAttribute(Seq("a"))) | ||
| private val nullValue = Literal.create(null, IntegerType) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you change from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need the same dataType as i have for the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, I think its better to avoid the behaviour changes in the existing tests. |
||
| private val colA = UnresolvedAttribute(Seq("a")) | ||
| private val nullIntolerantExp = Abs(colA) | ||
| private val nullTolerantExp = Coalesce(Seq(colA, Literal(5))) | ||
|
|
||
| val isNullCondA = IsNull(colA) | ||
| val isNotNullCond = IsNotNull(colA) | ||
| val isNullCond = IsNull(UnresolvedAttribute("b")) | ||
| val notCond = Not(UnresolvedAttribute("c")) | ||
|
|
||
|
|
@@ -80,6 +87,74 @@ class SimplifyConditionalSuite extends PlanTest with PredicateHelper { | |
| Literal(9))) | ||
| } | ||
|
|
||
| test("remove redundant null-check for If based on null-Intolerant expressions") { | ||
| assertEquivalent( | ||
| If(isNullCondA, nullValue, nullIntolerantExp), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| If(isNullCondA, colA, nullIntolerantExp), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| If(isNotNullCond, nullIntolerantExp, nullValue), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| If(isNotNullCond, nullIntolerantExp, colA), | ||
| nullIntolerantExp) | ||
|
|
||
| // Try also more complex case | ||
| assertEquivalent( | ||
| If(isNotNullCond, Abs(nullIntolerantExp), colA), | ||
| Abs(nullIntolerantExp)) | ||
|
|
||
| // We do not remove the null check if the expression is not null-intolerant | ||
| assertEquivalent( | ||
| If(isNullCondA, nullValue, nullTolerantExp), | ||
| If(isNullCondA, nullValue, nullTolerantExp)) | ||
|
|
||
| assertEquivalent( | ||
| If(isNotNullCond, nullTolerantExp, nullValue), | ||
| If(isNotNullCond, nullTolerantExp, nullValue)) | ||
|
|
||
| // Try also more complex case | ||
| assertEquivalent( | ||
| If(isNotNullCond, Abs(nullTolerantExp), nullValue), | ||
| If(isNotNullCond, Abs(nullTolerantExp), nullValue)) | ||
| } | ||
|
|
||
| test("remove redundant null-check for CaseWhen based on null-Intolerant expressions") { | ||
| assertEquivalent( | ||
| CaseWhen(Seq((isNullCondA, nullValue)), Some(nullIntolerantExp)), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| CaseWhen(Seq((isNullCondA, colA)), Some(nullIntolerantExp)), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| CaseWhen(Seq((isNotNullCond, nullIntolerantExp))), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| CaseWhen(Seq((isNotNullCond, nullIntolerantExp)), Some(colA)), | ||
| nullIntolerantExp) | ||
|
|
||
| assertEquivalent( | ||
| CaseWhen(Seq((isNotNullCond, nullIntolerantExp)), Some(nullValue)), | ||
| nullIntolerantExp) | ||
|
|
||
| // We do not remove the null check if the expression is not null-intolerant | ||
| assertEquivalent( | ||
| CaseWhen(Seq((isNotNullCond, nullTolerantExp))), | ||
| CaseWhen(Seq((isNotNullCond, nullTolerantExp)))) | ||
|
|
||
| assertEquivalent( | ||
| CaseWhen(Seq((isNullCondA, nullValue)), Some(nullTolerantExp)), | ||
| CaseWhen(Seq((isNullCondA, nullValue)), Some(nullTolerantExp))) | ||
| } | ||
|
|
||
| test("remove unreachable branches") { | ||
| // i.e. removing branches whose conditions are always false | ||
| assertEquivalent( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same logic? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L105-L109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually i think we need slightly different logic. Consider these two examples where
xwill be the null-checked column:substring(x, coalesce(a, b), c)substring(coalesce(x, d), a, c)For 1. we need to be null-intolerant (even though
coalesceis null-tolerant), so ifxis null, we replace thesubstringwith null value no matter what are the other children. For 2. we need to be null-tolerant and we will not replace thesubstringby null value. So we need to check the expression with respect to the position ofx(the column that is being null-checked). Does it make sense?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, you meant
FiterExec.isNullIntolerant(ifNotNullExpr) || additional checks for the case having null-tolerant exprs inside ifNotNullExpr? (FiterExec.isNullIntolerantis private though...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
FilterExec.isNullIntolerant(ifNotNullExpr)is a stronger condition than we need so in case there is null-tolerant expr inside we need to check if the null-checked column is in its subtree. Using the logic fromFilterExec.isNullIntolerantthe function could look like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. For better code readability, could you split the condition into the two parts as I suggested above? Also, I think its better to leave some comments about why we need more checks there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the committed code is not very intuitive so i can think of this way which seems to be more readable (added also some comments):
But not sure if this is what you had in mind when suggesting to split the condition. Can you think of a better way how to compose this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, that still looks complicated.. If we cannot avoid the complexity for the stronger condition, as another option, I think we can cover the simple case (
FiterExec.isNullIntolerant(ifNotNullExpr)) only in this pr. If necessary, we might be able to optimize the condition in future work. I think keeping the code simple is more important. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the think is that if we use the simple version with
FilterExec.isNullIntolerant(ifNutNullExpr)we will loose (because of the recursive check) all expressions that contain literals (because literals are null-tolerant), so for example expressions like thissubstring(title#5, 0, 3)will not be included in the optimization (which the jira was targeted for in the first place). So I suggest one of these 2 options:where
checkedExprmust be direct child and thus we don't have to check the whole subtree for null-intolerance (so expressions that have Literals in the subtree are still included).I am fine with either of these 2 options. What do you think?