-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Simplify expr = L1 AND expr != L2 to expr = L1 when L1 != L2
#19731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7fd7cde to
8375534
Compare
8375534 to
fdb54b8
Compare
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A clever simplify! It could reduce the CPU cost for FilterExec and row filter in parquet
| WHERE ((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol')); | ||
| ---- | ||
| logical_plan | ||
| 01)Filter: d.dept_name != Utf8View("Engineering") AND e.name = Utf8View("Alice") OR e.name != Utf8View("Alice") AND e.name = Utf8View("Carol") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right? It looks like it has rewritten
((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol'));to
((dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name = 'Carol'));But name and e.name are different 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update -- looking at this, it seems like actually the predicate actually did resolve both name references to the same thing (name@1) in the physical plan
01)FilterExec: dept_name@2 != Engineering AND name@1 = Alice OR name@1 != Alice AND name@1 = Carol
Maybe we can update this test (in a follow on PR) so it uses the fully qualified column references to make it clearer the intent
((d.dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name != 'Alice' AND e.name = 'Carol'));
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @simonvandel and @xudong963
| WHERE ((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol')); | ||
| ---- | ||
| logical_plan | ||
| 01)Filter: d.dept_name != Utf8View("Engineering") AND e.name = Utf8View("Alice") OR e.name != Utf8View("Alice") AND e.name = Utf8View("Carol") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update -- looking at this, it seems like actually the predicate actually did resolve both name references to the same thing (name@1) in the physical plan
01)FilterExec: dept_name@2 != Engineering AND name@1 = Alice OR name@1 != Alice AND name@1 = Carol
Maybe we can update this test (in a follow on PR) so it uses the fully qualified column references to make it clearer the intent
((d.dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name != 'Alice' AND e.name = 'Carol'));
Which issue does this PR close?
Rationale for this change
Add a simplifier rule to remove a comparison.
It's probably unlikely that a human would write such an expression, but a generated query, or other optimizations, may lead to such an expression.
What changes are included in this PR?
Simplify
expr = L1 AND expr != L2toexpr = L1whenL1 != L2Are these changes tested?
Added unit tests and SLT.
The first commit shows the plan before the optimization.
Are there any user-facing changes?
Fewer runtime-ops if the plan contains the pattern.