[SPARK-54881][SQL] Improve `BooleanSimplification` to handle negation of conjunction and disjunction in one pass #53658

ahshahid · 2026-01-01T23:03:05Z

Fix to simplify boolean expression of form like !(expr1 || expr2) in a single pass, where expr1 and expr2 are binary comparison expression

What changes were proposed in this pull request?

In the rule BooleanSimplification , following two changes are done:

The current partial function passed as lambda to the transformExpressionUp api, is stored in a
"val actualExprTransformer"
Instead of passing the lambda to the transformExpressionUp, the val actualExprTransformer, is passed.

Till this point the code change is mere refactoring.
The main change in the logic is
3) for the two cases

case Not(a Or b) =>
And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
actualExprTransformer
}

case Not(a And b) =>
Or(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
actualExprTransformer
}

The new child node of AND and OR, are immediately acted upon by the partial function of expression transformer using transformExpressionDown, which will be efficient as the traversal on subtree will stop immediately if the node does not contain any NOT operator.

Why are the changes needed?

The change is needed because in the case of tramsformUp, the idempotency is not achieved in the optimal way ( single pass compared to double pass).
The issue arises due to rule transforming
Not (A || B) => (Not(A) AND Not(B))
Because the new child has added Not operations, they are not acted in that pass due to transformUp.
With transformDown, the new children with Not, would be simplified in that pass itself.

Please note that merely changing transformExpressionUp to transformExpressionDown, though will fix this issue, it will break idempotency for other cases ( as seen by failure in ConstantFoldingSuite.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added bug test

Was this patch authored or co-authored using generative AI tooling?

No

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

github-actions · 2026-01-01T23:03:18Z

JIRA Issue Information

=== Improvement SPARK-54881 ===
Summary: BooleanSimplification rule using transformExpressionsUp instead of transformExpressionsDown, is inefficient in some cases resulting in delayed idempotency
Assignee: None
Status: Open
Affected: ["4.1.0","4.2.0","4.1.1"]

This comment was automatically generated by GitHub Actions

ahshahid · 2026-01-01T23:03:40Z

Once the tests are clean, will remove the WIP mark.

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

peter-toth · 2026-01-06T08:31:23Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

-      case Not(a LessThan b) => GreaterThanOrEqual(a, b)
-      case Not(a LessThanOrEqual b) => GreaterThan(a, b)
+    case Not(a Or b) =>
+      And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {


Is this safe? I mean, before this PR the simplification logic of actualExprTransformer was called with transformUp..., but now you call it with transformDown... (please note that a Not node can be deep down in a or b). Is there any reason why we invoke the logic with transformUp or could the whole rule use transformDown on expression trees?

Why not something like And(actualExprTransformer.applyOrElse(Not(a), identity), actualExprTransformer.applyOrElse(Not(b), identity)) just to be on the safe side?

Is this safe? I mean, before this PR the simplification logic of actualExprTransformer was called with transformUp..., but now you call it with transformDown... (please note that a Not node can be deep down in a or b). Is there any reason why we invoke the logic with transformUp or could the whole rule use transformDown on expression trees?

I believe it's safe..
If the original logic is modified such that instead of transform up ,
transform down is used, then this bug would be fixed, but other cases like
that mentioned in Constant folding suite will break in idempotency.
To take care of both the cases, use of transform up and transform down is
needed...as in the pr. This reason is also mentioned in the initial PR details.

I wonder if it would make sense to split the logic into 2 traversals? Keep the current transformExpressionsUpWithPruning() with the current cases excluding these 2 Not "pushdowns" and then a transformExpressionsDownWithPruning() with these 2 cases.

That in my view, would defeat the purpose of achieving idempotency in a minimum possible tree traversal. If we separate it in 2 traversals, then only for a part of subtree , the whole traversal will have to happen again.
As such I do not see any issue with the current code of subtree traversal of the newly added children to cause any issue.. Is there something which is making it suspicious?

Besides that it is hard to reason about a nested traversals, my problem with the current inner transformDownWithPruning() is that it can call actualExprTransformer top-down way not only on the new And and Not nodes, but also on nodes of a and b subtrees if those contain Not nodes.
The current rule might be safe in top-down manner as well, but I feel it would be a bit cleaner to separate the traversals. But, on the other hand, separating the traversals would require 2 unique rule ids so the current PR has pros as well.

Anyways, I'm ok with this PR.

@cloud-fan, do you have any concerns or comments on this?

I think that rules like BooleanSimplification would work same bottom - up, or top - down in terms of functionality, so long as number of iterations to achieve idempotency is ignored.
If one goes top -down, some cases (Not) become optimal, while if you bottom - up ( other cases like depicted in ConstantFoldinghSuite become optimal).

The point is that the subtrees in NOT (Junction) before being acted upon by top- down rule , have already undergone the traversal of bottom- up, so the top - down would act only for pushing of Not, and moreover the traversal would terminate the moment subtree has no NOT pushed.

In my mind, I am comfortable with the behaviour.

ahshahid · 2026-01-06T09:43:40Z

Also pls note that, this change of transform down is only for the new Not created as children of Junction op.. that is basically processing the newly added Not nodes, right there, as otherwise it will get processed in next iteration of the rule.

…

On Tue, Jan 6, 2026, 1:29 AM Asif Shahid ***@***.***> wrote: I believe it's safe.. If the original logic is modified such that instead of transform up , transform down is used, then this bug would be fixed, but other cases like that mentioned in Constant folding suite will break in idempotency. To take care of both the cases, use of transform up and transform down is needed...as in the pr On Tue, Jan 6, 2026, 12:31 AM Peter Toth ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala > <#53658 (comment)>: > > > > - case Not(a LessThan b) => GreaterThanOrEqual(a, b) > - case Not(a LessThanOrEqual b) => GreaterThan(a, b) > + case Not(a Or b) => > + And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) { > > Is this safe? I mean, before this PR the simplification logic of > actualExprTransformer was called with transformUp..., but now you call > it with transformDown... (please note that a Not node can be deep down > in a or b). Is there any reason why we invoke the logic with transformUp > or could the whole rule use transformDown on expression trees? > > — > Reply to this email directly, view it on GitHub > <#53658 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC6XG2DUVG2AXN2P4L5ADAT4FNXHFAVCNFSM6AAAAACQPKSJM6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMMRZHEZTONBSGA> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

ahshahid · 2026-01-06T09:49:12Z

And sincere thanks for taking a look at it.

…

On Tue, Jan 6, 2026, 1:43 AM Asif Shahid ***@***.***> wrote: Also pls note that, this change of transform down is only for the new Not created as children of Junction op.. that is basically processing the newly added Not nodes, right there, as otherwise it will get processed in next iteration of the rule. On Tue, Jan 6, 2026, 1:29 AM Asif Shahid ***@***.***> wrote: > I believe it's safe.. > If the original logic is modified such that instead of transform up , > transform down is used, then this bug would be fixed, but other cases like > that mentioned in Constant folding suite will break in idempotency. > To take care of both the cases, use of transform up and transform down is > needed...as in the pr > > On Tue, Jan 6, 2026, 12:31 AM Peter Toth ***@***.***> > wrote: > >> ***@***.**** commented on this pull request. >> ------------------------------ >> >> In >> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala >> <#53658 (comment)>: >> >> > >> - case Not(a LessThan b) => GreaterThanOrEqual(a, b) >> - case Not(a LessThanOrEqual b) => GreaterThan(a, b) >> + case Not(a Or b) => >> + And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) { >> >> Is this safe? I mean, before this PR the simplification logic of >> actualExprTransformer was called with transformUp..., but now you call >> it with transformDown... (please note that a Not node can be deep down >> in a or b). Is there any reason why we invoke the logic with transformUp >> or could the whole rule use transformDown on expression trees? >> >> — >> Reply to this email directly, view it on GitHub >> <#53658 (review)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AC6XG2DUVG2AXN2P4L5ADAT4FNXHFAVCNFSM6AAAAACQPKSJM6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMMRZHEZTONBSGA> >> . >> You are receiving this because you authored the thread.Message ID: >> ***@***.***> >> >

peter-toth · 2026-01-06T09:52:19Z

Also pls note that, this change of transform down is only for the new Not created as children of Junction op.. that is basically processing the newly added Not nodes, right there, as otherwise it will get processed in next iteration of the rule.

See #53658 (comment) if we want to process only the new Not nodes.

ahshahid · 2026-01-06T10:16:41Z

Just to be clear: When I replied below, I missed the fact that you were asking why not just call apply... That reason is given in previous reply.. I am on mobile...will comment properly tomorrow.. May be add another extra test to show why just apply() will not work

…

On Tue, Jan 6, 2026, 2:08 AM Asif Shahid ***@***.***> wrote: What you are suggesting is also valid ..in fact I initially did that. But it looked cleaner to me to let it process by passing the new junction , instead of explicitly calling on two Not children. Calling on a single expression , the transformer seems more natural... On Tue, Jan 6, 2026, 1:52 AM Peter Toth ***@***.***> wrote: > *peter-toth* left a comment (apache/spark#53658) > <#53658 (comment)> > > Also pls note that, this change of transform down is only for the new Not > created as children of Junction op.. that is basically processing the newly > added Not nodes, right there, as otherwise it will get processed in next > iteration of the rule. > > See #53658 (comment) > <#53658 (comment)> if > we want to process only the new Not nodes. > > — > Reply to this email directly, view it on GitHub > <#53658 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC6XG2BYRD7CNDJH2DJXMOL4FOAWTAVCNFSM6AAAAACQPKSJM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMJTHE4TOMJZGI> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

ahshahid · 2026-01-06T19:58:38Z

@peter-toth . I have added another test.
I suppose what you are suggesting of that is
And(actualExprTransformer.applyOrElse(Not(a), identity), actualExprTransformer.applyOrElse(Not(b), identity))
may also work, but that can result in recursive calls and to me seems more complicated to understand.
While traverse down logic seems to me easier to comprehend, without recursive calls and the rule pattern of "NOT" would ensure immediate return, if NOT is no longer pushed to children..

cloud-fan · 2026-01-07T11:32:35Z

if the only optimization is to process the newly created Not immediately, shall we narrow down the scope? just add a new util function and call it when a Not is created.

ahshahid · 2026-01-07T18:33:11Z

if the only optimization is to process the newly created Not immediately, shall we narrow down the scope? just add a new util function and call it when a Not is created.

I would not prefer that , as it would mean code duplication .. I think. The logic in the transforming code applied on the whole tree, is same as the logic applied on the subtree... so splitting should not be done.. if you get what I mean..
Its just matter of reprocessing the new sub node, before attaching to the main tree, and that pre processing logic is same for both subtree and whole tree.

ahshahid · 2026-01-07T18:45:06Z

thank you @peter-toth and @cloud-fan for detailed analysis... my pov is known to you all.
I suppose you all know the best, so pls do as you think appropriate..

peter-toth · 2026-01-08T17:34:21Z

How about adjusting this PR with ahshahid#1?

ahshahid · 2026-01-08T18:37:34Z

How about adjusting this PR with ahshahid#1?

I have my reservation for this as it would be applying the NotTransformer only on the current node and would cause recursion.
I still think that the logic of transform for whole tree and subtree ( Not) should not be changed, as every case on the whole tree is applicable to the subtree. And would open more window for error.
If I am not mistaken, in the change proposed for the "Not Transformer", other cases like Not(a LessT b) => , etc are missing.
So it will require more diligence so as not to miss any other possible situations.

peter-toth · 2026-01-08T19:00:17Z

I think I moved all cases that handles Not into transformNots, or at least I wanted to do so...
I believe we want to apply transformNots recursively to be able to push down the Not node as deep as possible, but I agree that we could use transformDownWithPruning instead of calling transformNots explicitely.
Also, I don't see why it would make sense to handle other expressions while traversing down. IMO all we want to do is pushing the Not nodes down, but if you have a case when this is not sufficient then let's add a test.

ahshahid · 2026-01-08T19:09:44Z

I think I moved all cases that handles Not into transformNots, or at least I wanted to do so... I believe we want to apply transformNots recursively to be able to push down the Not node as deep as possible, but I agree that we could use transformDownWithPruning instead of calling transformNots explicitely. Also, I don't see why it would make sense to handle other expressions while traversing down. IMO all we want to do is pushing the Not nodes down, but if you have a case when this is not sufficient then let's add a test.

The benefit in the original PR as I see it is:

No recursion
No breaking of the code ( the idea being processing of subtree is no different from whole tree)
No chance missing of cases ( like may be you want to test your code for inequalities of the form >=, <=, < , > etc).
Less code and to me its easy to comprehend ( due to the idea of point no. 2)
At the same time early return.
I dont see any issue with the code or any missing case. so do not exactly understand the reason for futher change.

ahshahid · 2026-01-08T20:48:09Z

@peter-toth I see that we are in agreement with transformDown.. Thank you for your understanding.
If you still think that separating the cases for Not from others would not miss any un-anticipated situations , then I will not block .. Though I urge you to reconsider....

SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…

24141c4

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

github-actions bot added the SQL label Jan 1, 2026

ahshahid added 2 commits January 1, 2026 20:50

SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…

9813a1e

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…

8adb0da

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

peter-toth reviewed Jan 6, 2026

View reviewed changes

SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…

0b8ac8e

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression

SPARK-54881. refactored test

ead1da5

[SPARK-54881][SQL] Improve BooleanSimplification to handle negation of conjunction and disjunction in one pass #53658

Are you sure you want to change the base?

[SPARK-54881][SQL] Improve BooleanSimplification to handle negation of conjunction and disjunction in one pass #53658

Conversation

ahshahid commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JIRA Issue Information

Uh oh!

ahshahid commented Jan 1, 2026

Uh oh!

peter-toth Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahshahid Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

ahshahid Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ahshahid Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ahshahid commented Jan 6, 2026 via email

Uh oh!

ahshahid commented Jan 6, 2026 via email

Uh oh!

peter-toth commented Jan 6, 2026

Uh oh!

ahshahid commented Jan 6, 2026 via email

Uh oh!

ahshahid commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jan 7, 2026

Uh oh!

ahshahid commented Jan 7, 2026

Uh oh!

ahshahid commented Jan 7, 2026

Uh oh!

peter-toth commented Jan 8, 2026

Uh oh!

ahshahid commented Jan 8, 2026

Uh oh!

peter-toth commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahshahid commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahshahid commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-54881][SQL] Improve `BooleanSimplification` to handle negation of conjunction and disjunction in one pass #53658

[SPARK-54881][SQL] Improve `BooleanSimplification` to handle negation of conjunction and disjunction in one pass #53658

ahshahid commented Jan 1, 2026 •

edited

Loading

github-actions bot commented Jan 1, 2026 •

edited

Loading

peter-toth Jan 6, 2026 •

edited

Loading

ahshahid commented Jan 6, 2026 •

edited

Loading

peter-toth commented Jan 8, 2026 •

edited

Loading

ahshahid commented Jan 8, 2026 •

edited

Loading