-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54881][SQL] Improve BooleanSimplification to handle negation of conjunction and disjunction in one pass
#53658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
JIRA Issue Information=== Improvement SPARK-54881 === This comment was automatically generated by GitHub Actions |
|
Once the tests are clean, will remove the WIP mark. |
…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
| case Not(a LessThan b) => GreaterThanOrEqual(a, b) | ||
| case Not(a LessThanOrEqual b) => GreaterThan(a, b) | ||
| case Not(a Or b) => | ||
| And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe? I mean, before this PR the simplification logic of actualExprTransformer was called with transformUp..., but now you call it with transformDown... (please note that a Not node can be deep down in a or b). Is there any reason why we invoke the logic with transformUp or could the whole rule use transformDown on expression trees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not something like And(actualExprTransformer.applyOrElse(Not(a), identity), actualExprTransformer.applyOrElse(Not(b), identity)) just to be on the safe side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe? I mean, before this PR the simplification logic of
actualExprTransformerwas called withtransformUp..., but now you call it withtransformDown...(please note that aNotnode can be deep down inaorb). Is there any reason why we invoke the logic withtransformUpor could the whole rule usetransformDownon expression trees?
I believe it's safe..
If the original logic is modified such that instead of transform up ,
transform down is used, then this bug would be fixed, but other cases like
that mentioned in Constant folding suite will break in idempotency.
To take care of both the cases, use of transform up and transform down is
needed...as in the pr. This reason is also mentioned in the initial PR details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to split the logic into 2 traversals? Keep the current transformExpressionsUpWithPruning() with the current cases excluding these 2 Not "pushdowns" and then a transformExpressionsDownWithPruning() with these 2 cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That in my view, would defeat the purpose of achieving idempotency in a minimum possible tree traversal. If we separate it in 2 traversals, then only for a part of subtree , the whole traversal will have to happen again.
As such I do not see any issue with the current code of subtree traversal of the newly added children to cause any issue.. Is there something which is making it suspicious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides that it is hard to reason about a nested traversals, my problem with the current inner transformDownWithPruning() is that it can call actualExprTransformer top-down way not only on the new And and Not nodes, but also on nodes of a and b subtrees if those contain Not nodes.
The current rule might be safe in top-down manner as well, but I feel it would be a bit cleaner to separate the traversals. But, on the other hand, separating the traversals would require 2 unique rule ids so the current PR has pros as well.
Anyways, I'm ok with this PR.
@cloud-fan, do you have any concerns or comments on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that rules like BooleanSimplification would work same bottom - up, or top - down in terms of functionality, so long as number of iterations to achieve idempotency is ignored.
If one goes top -down, some cases (Not) become optimal, while if you bottom - up ( other cases like depicted in ConstantFoldinghSuite become optimal).
The point is that the subtrees in NOT (Junction) before being acted upon by top- down rule , have already undergone the traversal of bottom- up, so the top - down would act only for pushing of Not, and moreover the traversal would terminate the moment subtree has no NOT pushed.
In my mind, I am comfortable with the behaviour.
|
Also pls note that, this change of transform down is only for the new Not
created as children of Junction op.. that is basically processing the newly
added Not nodes, right there, as otherwise it will get processed in next
iteration of the rule.
…On Tue, Jan 6, 2026, 1:29 AM Asif Shahid ***@***.***> wrote:
I believe it's safe..
If the original logic is modified such that instead of transform up ,
transform down is used, then this bug would be fixed, but other cases like
that mentioned in Constant folding suite will break in idempotency.
To take care of both the cases, use of transform up and transform down is
needed...as in the pr
On Tue, Jan 6, 2026, 12:31 AM Peter Toth ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
> <#53658 (comment)>:
>
> >
> - case Not(a LessThan b) => GreaterThanOrEqual(a, b)
> - case Not(a LessThanOrEqual b) => GreaterThan(a, b)
> + case Not(a Or b) =>
> + And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
>
> Is this safe? I mean, before this PR the simplification logic of
> actualExprTransformer was called with transformUp..., but now you call
> it with transformDown... (please note that a Not node can be deep down
> in a or b). Is there any reason why we invoke the logic with transformUp
> or could the whole rule use transformDown on expression trees?
>
> —
> Reply to this email directly, view it on GitHub
> <#53658 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AC6XG2DUVG2AXN2P4L5ADAT4FNXHFAVCNFSM6AAAAACQPKSJM6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMMRZHEZTONBSGA>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
|
And sincere thanks for taking a look at it.
…On Tue, Jan 6, 2026, 1:43 AM Asif Shahid ***@***.***> wrote:
Also pls note that, this change of transform down is only for the new Not
created as children of Junction op.. that is basically processing the newly
added Not nodes, right there, as otherwise it will get processed in next
iteration of the rule.
On Tue, Jan 6, 2026, 1:29 AM Asif Shahid ***@***.***> wrote:
> I believe it's safe..
> If the original logic is modified such that instead of transform up ,
> transform down is used, then this bug would be fixed, but other cases like
> that mentioned in Constant folding suite will break in idempotency.
> To take care of both the cases, use of transform up and transform down is
> needed...as in the pr
>
> On Tue, Jan 6, 2026, 12:31 AM Peter Toth ***@***.***>
> wrote:
>
>> ***@***.**** commented on this pull request.
>> ------------------------------
>>
>> In
>> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
>> <#53658 (comment)>:
>>
>> >
>> - case Not(a LessThan b) => GreaterThanOrEqual(a, b)
>> - case Not(a LessThanOrEqual b) => GreaterThan(a, b)
>> + case Not(a Or b) =>
>> + And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
>>
>> Is this safe? I mean, before this PR the simplification logic of
>> actualExprTransformer was called with transformUp..., but now you call
>> it with transformDown... (please note that a Not node can be deep down
>> in a or b). Is there any reason why we invoke the logic with transformUp
>> or could the whole rule use transformDown on expression trees?
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#53658 (review)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AC6XG2DUVG2AXN2P4L5ADAT4FNXHFAVCNFSM6AAAAACQPKSJM6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMMRZHEZTONBSGA>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
See #53658 (comment) if we want to process only the new |
|
Just to be clear:
When I replied below, I missed the fact that you were asking why not just
call apply...
That reason is given in previous reply..
I am on mobile...will comment properly tomorrow..
May be add another extra test to show why just apply() will not work
…On Tue, Jan 6, 2026, 2:08 AM Asif Shahid ***@***.***> wrote:
What you are suggesting is also valid ..in fact I initially did that. But
it looked cleaner to me to let it process by passing the new junction ,
instead of explicitly calling on two Not children.
Calling on a single expression , the transformer seems more natural...
On Tue, Jan 6, 2026, 1:52 AM Peter Toth ***@***.***> wrote:
> *peter-toth* left a comment (apache/spark#53658)
> <#53658 (comment)>
>
> Also pls note that, this change of transform down is only for the new Not
> created as children of Junction op.. that is basically processing the newly
> added Not nodes, right there, as otherwise it will get processed in next
> iteration of the rule.
>
> See #53658 (comment)
> <#53658 (comment)> if
> we want to process only the new Not nodes.
>
> —
> Reply to this email directly, view it on GitHub
> <#53658 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AC6XG2BYRD7CNDJH2DJXMOL4FOAWTAVCNFSM6AAAAACQPKSJM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMJTHE4TOMJZGI>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
…pr2) in a single pass, where expr1 and expr2 themselves are binary comparison types of expression
|
@peter-toth . I have added another test. |
BooleanSimplification to handle negation of conjunction and disjunction in one pass
|
if the only optimization is to process the newly created |
I would not prefer that , as it would mean code duplication .. I think. The logic in the transforming code applied on the whole tree, is same as the logic applied on the subtree... so splitting should not be done.. if you get what I mean.. |
|
thank you @peter-toth and @cloud-fan for detailed analysis... my pov is known to you all. |
|
How about adjusting this PR with ahshahid#1? |
I have my reservation for this as it would be applying the NotTransformer only on the current node and would cause recursion. |
|
I think I moved all cases that handles |
The benefit in the original PR as I see it is:
|
|
@peter-toth I see that we are in agreement with transformDown.. Thank you for your understanding. |
Fix to simplify boolean expression of form like !(expr1 || expr2) in a single pass, where expr1 and expr2 are binary comparison expression
What changes were proposed in this pull request?
In the rule BooleanSimplification , following two changes are done:
"val actualExprTransformer"
Till this point the code change is mere refactoring.
The main change in the logic is
3) for the two cases
case Not(a Or b) =>
And(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
actualExprTransformer
}
case Not(a And b) =>
Or(Not(a), Not(b)).transformDownWithPruning(_.containsPattern(NOT), ruleId) {
actualExprTransformer
}
The new child node of AND and OR, are immediately acted upon by the partial function of expression transformer using transformExpressionDown, which will be efficient as the traversal on subtree will stop immediately if the node does not contain any NOT operator.
Why are the changes needed?
The change is needed because in the case of tramsformUp, the idempotency is not achieved in the optimal way ( single pass compared to double pass).
The issue arises due to rule transforming
Not (A || B) => (Not(A) AND Not(B))
Because the new child has added Not operations, they are not acted in that pass due to transformUp.
With transformDown, the new children with Not, would be simplified in that pass itself.
Please note that merely changing transformExpressionUp to transformExpressionDown, though will fix this issue, it will break idempotency for other cases ( as seen by failure in ConstantFoldingSuite.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added bug test
Was this patch authored or co-authored using generative AI tooling?
No