Skip to content

Conversation

@ahshahid
Copy link

@ahshahid ahshahid commented Jan 1, 2026

Fix to simplify boolean expression of form like !(expr1 || expr2) in a single pass, where expr1 and expr2 are binary comparison expression

What changes were proposed in this pull request?

In the rule BooleanSimplification , following two changes are done:

  1. The current partial function passed as lambda to the transformExpressionUp api, is stored in a
    "val actualExprTransformer"
  2. Instead of passing the lambda to the transformExpressionUp, the val actualExprTransformer, is passed.

Till this point the code change is mere refactoring.
The main change in the logic is
3) for the two cases

case Not(a Or b) =>
And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
actualExprTransformer
}

case Not(a And b) =>
Or(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
actualExprTransformer
}

The new child node of AND and OR, are immediately acted upon by the partial function of expression transformer using transformExpressionDown, which will be efficient as the traversal on subtree will stop immediately if the node does not contain any NOT operator.

Why are the changes needed?

The change is needed because in the case of tramsformUp, the idempotency is not achieved in the optimal way ( single pass compared to double pass).
The issue arises due to rule transforming
Not (A || B) => (Not(A) AND Not(B))
Because the new child has added Not operations, they are not acted in that pass due to transformUp.
With transformDown, the new children with Not, would be simplified in that pass itself.

Please note that merely changing transformExpressionUp to transformExpressionDown, though will fix this issue, it will break idempotency for other cases ( as seen by failure in ConstantFoldingSuite.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added bug test

Was this patch authored or co-authored using generative AI tooling?

No

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
@github-actions github-actions bot added the SQL label Jan 1, 2026
@github-actions
Copy link

github-actions bot commented Jan 1, 2026

JIRA Issue Information

=== Improvement SPARK-54881 ===
Summary: BooleanSimplification rule using transformExpressionsUp instead of transformExpressionsDown, is inefficient in some cases resulting in delayed idempotency
Assignee: None
Status: Open
Affected: ["4.1.0","4.2.0","4.1.1"]


This comment was automatically generated by GitHub Actions

@ahshahid
Copy link
Author

ahshahid commented Jan 1, 2026

Once the tests are clean, will remove the WIP mark.

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
@ahshahid ahshahid changed the title [WIP][SPARK-54881][SQL]Fix to simplify boolean expression of form like !(expr1 || expr2) in a single pass, where expr1 and expr2 are binary comparison expression [SPARK-54881][SQL]Fix to simplify boolean expression of form like !(expr1 || expr2) in a single pass, where expr1 and expr2 are binary comparison expression Jan 2, 2026
case Not(a LessThan b) => GreaterThanOrEqual(a, b)
case Not(a LessThanOrEqual b) => GreaterThan(a, b)
case Not(a Or b) =>
And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe? I mean, before this PR the simplification logic of actualExprTransformer was called with transformUp..., but now you call it with transformDown... (please note that a Not node can be deep down in a or b). Is there any reason why we invoke the logic with transformUp or could the whole rule use transformDown on expression trees?

Copy link
Contributor

@peter-toth peter-toth Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not something like And(actualExprTransformer.applyOrElse(Not(a), identity), actualExprTransformer.applyOrElse(Not(b), identity)) just to be on the safe side?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe? I mean, before this PR the simplification logic of actualExprTransformer was called with transformUp..., but now you call it with transformDown... (please note that a Not node can be deep down in a or b). Is there any reason why we invoke the logic with transformUp or could the whole rule use transformDown on expression trees?

I believe it's safe..
If the original logic is modified such that instead of transform up ,
transform down is used, then this bug would be fixed, but other cases like
that mentioned in Constant folding suite will break in idempotency.
To take care of both the cases, use of transform up and transform down is
needed...as in the pr. This reason is also mentioned in the initial PR details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to split the logic into 2 traversals? Keep the current transformExpressionsUpWithPruning() with the current cases excluding these 2 Not "pushdowns" and then a transformExpressionsDownWithPruning() with these 2 cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That in my view, would defeat the purpose of achieving idempotency in a minimum possible tree traversal. If we separate it in 2 traversals, then only for a part of subtree , the whole traversal will have to happen again.
As such I do not see any issue with the current code of subtree traversal of the newly added children to cause any issue.. Is there something which is making it suspicious?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides that it is hard to reason about a nested traversals, my problem with the current inner transformDownWithPruning() is that it can call actualExprTransformer top-down way not only on the new And and Not nodes, but also on nodes of a and b subtrees if those contain Not nodes.
The current rule might be safe in top-down manner as well, but I feel it would be a bit cleaner to separate the traversals. But, on the other hand, separating the traversals would require 2 unique rule ids so the current PR has pros as well.

Anyways, I'm ok with this PR.

@cloud-fan, do you have any concerns or comments on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that rules like BooleanSimplification would work same bottom - up, or top - down in terms of functionality, so long as number of iterations to achieve idempotency is ignored.
If one goes top -down, some cases (Not) become optimal, while if you bottom - up ( other cases like depicted in ConstantFoldinghSuite become optimal).

The point is that the subtrees in NOT (Junction) before being acted upon by top- down rule , have already undergone the traversal of bottom- up, so the top - down would act only for pushing of Not, and moreover the traversal would terminate the moment subtree has no NOT pushed.

In my mind, I am comfortable with the behaviour.

@ahshahid
Copy link
Author

ahshahid commented Jan 6, 2026 via email

@ahshahid
Copy link
Author

ahshahid commented Jan 6, 2026 via email

@peter-toth
Copy link
Contributor

Also pls note that, this change of transform down is only for the new Not created as children of Junction op.. that is basically processing the newly added Not nodes, right there, as otherwise it will get processed in next iteration of the rule.

See #53658 (comment) if we want to process only the new Not nodes.

@ahshahid
Copy link
Author

ahshahid commented Jan 6, 2026 via email

…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
@ahshahid
Copy link
Author

ahshahid commented Jan 6, 2026

@peter-toth . I have added another test.
I suppose what you are suggesting of that is
And(actualExprTransformer.applyOrElse(Not(a), identity), actualExprTransformer.applyOrElse(Not(b), identity))
may also work, but that can result in recursive calls and to me seems more complicated to understand.
While traverse down logic seems to me easier to comprehend, without recursive calls and the rule pattern of "NOT" would ensure immediate return, if NOT is no longer pushed to children..

@peter-toth peter-toth changed the title [SPARK-54881][SQL]Fix to simplify boolean expression of form like !(expr1 || expr2) in a single pass, where expr1 and expr2 are binary comparison expression [SPARK-54881][SQL] Improve BooleanSimplification to handle negation of conjunction and disjunction in one pass Jan 7, 2026
@cloud-fan
Copy link
Contributor

if the only optimization is to process the newly created Not immediately, shall we narrow down the scope? just add a new util function and call it when a Not is created.

@ahshahid
Copy link
Author

ahshahid commented Jan 7, 2026

if the only optimization is to process the newly created Not immediately, shall we narrow down the scope? just add a new util function and call it when a Not is created.

I would not prefer that , as it would mean code duplication .. I think. The logic in the transforming code applied on the whole tree, is same as the logic applied on the subtree... so splitting should not be done.. if you get what I mean..
Its just matter of reprocessing the new sub node, before attaching to the main tree, and that pre processing logic is same for both subtree and whole tree.

@ahshahid
Copy link
Author

ahshahid commented Jan 7, 2026

thank you @peter-toth and @cloud-fan for detailed analysis... my pov is known to you all.
I suppose you all know the best, so pls do as you think appropriate..

@peter-toth
Copy link
Contributor

How about adjusting this PR with ahshahid#1?

@ahshahid
Copy link
Author

ahshahid commented Jan 8, 2026

How about adjusting this PR with ahshahid#1?

I have my reservation for this as it would be applying the NotTransformer only on the current node and would cause recursion.
I still think that the logic of transform for whole tree and subtree ( Not) should not be changed, as every case on the whole tree is applicable to the subtree. And would open more window for error.
If I am not mistaken, in the change proposed for the "Not Transformer", other cases like Not(a LessT b) => , etc are missing.
So it will require more diligence so as not to miss any other possible situations.

@peter-toth
Copy link
Contributor

peter-toth commented Jan 8, 2026

I think I moved all cases that handles Not into transformNots, or at least I wanted to do so...
I believe we want to apply transformNots recursively to be able to push down the Not node as deep as possible, but I agree that we could use transformDownWithPruning instead of calling transformNots explicitely.
Also, I don't see why it would make sense to handle other expressions while traversing down. IMO all we want to do is pushing the Not nodes down, but if you have a case when this is not sufficient then let's add a test.

@ahshahid
Copy link
Author

ahshahid commented Jan 8, 2026

I think I moved all cases that handles Not into transformNots, or at least I wanted to do so... I believe we want to apply transformNots recursively to be able to push down the Not node as deep as possible, but I agree that we could use transformDownWithPruning instead of calling transformNots explicitely. Also, I don't see why it would make sense to handle other expressions while traversing down. IMO all we want to do is pushing the Not nodes down, but if you have a case when this is not sufficient then let's add a test.

The benefit in the original PR as I see it is:

  1. No recursion
  2. No breaking of the code ( the idea being processing of subtree is no different from whole tree)
  3. No chance missing of cases ( like may be you want to test your code for inequalities of the form >=, <=, < , > etc).
  4. Less code and to me its easy to comprehend ( due to the idea of point no. 2)
  5. At the same time early return.
    I dont see any issue with the code or any missing case. so do not exactly understand the reason for futher change.

@ahshahid
Copy link
Author

ahshahid commented Jan 8, 2026

@peter-toth I see that we are in agreement with transformDown.. Thank you for your understanding.
If you still think that separating the cases for Not from others would not miss any un-anticipated situations , then I will not block .. Though I urge you to reconsider....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants