Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify small InListExpr #4090

Merged
merged 12 commits into from
Nov 4, 2022
Merged

Conversation

Dandandan
Copy link
Contributor

Which issue does this PR close?

Closes #4089

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Simplify small InListExpr
@github-actions github-actions bot added the optimizer Optimizer rules label Nov 2, 2022
@Dandandan Dandandan marked this pull request as ready for review November 3, 2022 17:07
Tweak
@github-actions github-actions bot added the core Core DataFusion crate label Nov 3, 2022
@alamb
Copy link
Contributor

alamb commented Nov 3, 2022

Until the fix for #4100 is merged, clippy will be failing on this PR as well

Copy link
Contributor

@isidentical isidentical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! One question I had was whether it makes sense to change THRESHOLD_INLINE_INLIST to be a configurable property (maybe inside SimplifyInfo) since this is more of a specific simplification for datafusion's physical in operator (when n is greater than 1) where other implementations might not have to perform as bad as they do for this smaller lists. Though it might be easier to wait until somebody actually needs to change the limit, so I think as is this PR looks perfect 💯

expr,
list,
negated,
} if list.len() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to understand why the length of 1 is special-cased. Maybe we could mention that the column reference check is strictly for ensuring that we are not doing an unnecessary evaluation of the left side over and over again (so a length of 1 is always fine or if it is something simple as a column access then it is also fine).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree documenting the rationale would be super helpful

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- nice work @Dandandan

expr,
list,
negated,
} if list.len() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree documenting the rationale would be super helpful

@alamb
Copy link
Contributor

alamb commented Nov 3, 2022

Looks great! One question I had was whether it makes sense to change THRESHOLD_INLINE_INLIST to be a configurable proper

I would recommend putting this on ConfigOptions if you make it configurable

@Dandandan Dandandan merged commit 60f3ef6 into apache:master Nov 4, 2022
@ursabot
Copy link

ursabot commented Nov 4, 2022

Benchmark runs are scheduled for baseline = 7e944ed and contender = 60f3ef6. 60f3ef6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Nov 4, 2022

👏

Dandandan added a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
* Simplify small InListExpr

Simplify small InListExpr

* Tweak

Tweak

* Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Feedback

* Feedback

* Tweak

* Tweak

Tweak

* Fmt

* clippy

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Nov 5, 2022
* Simplify small InListExpr

Simplify small InListExpr

* Tweak

Tweak

* Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Feedback

* Feedback

* Tweak

* Tweak

Tweak

* Fmt

* clippy

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify small InList expressions
4 participants