-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add PhysicalExpr optimizer and cast unwrapping #16530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8247e7e to
5c7d5c8
Compare
|
I will try and find time to review this tomorrow. Thank you @adriangb -- I can't keep up ! |
5c7d5c8 to
1032b5d
Compare
|
1032b5d 👨🏻🍳 |
|
One unfortunate thing: I can't add the optimizer to |
Fixed by moving this to |
24162e3 to
3dd706f
Compare
3dd706f to
6b2398a
Compare
|
I am sorry for the delay reviewing this, but there is a lot going on at the moment. We have quite a few PRs merging in main. I am thinking maybe next week we should target slowing down the new features and start preparing for hardening / bug fixing for the 49 release... |
|
Sounds good to me! This can easily wait until after next week. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb -- I think this PR is a really nice step forward.
I left some suggestions but I don't think any are necessary.
It is so cool to see this progressing towards having a good story for per-file optimized predicates. 🚀
BTW I wonder if we should use this simplifier in the FilterPushdownSimplifier @xudong963 added a few days ago in : #16362
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| //! Utilities for casting scalar literals to different data types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| } | ||
|
|
||
| /// Swap comparison operators for right-side cast unwrapping | ||
| fn swap_operator(op: Operator) -> Operator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use Operator::swap, right? I don't think there is any reason to have it specially here
| extract_cast_info(binary.right()), | ||
| ) { | ||
| // For literal op cast(expr), we need to swap the operator | ||
| let swapped_op = swap_operator(*binary.op()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't swap the operator I think the rewrite should stop (not just unwrap the comparison)
| use datafusion_common::ScalarValue; | ||
|
|
||
| /// Convert a literal value from one data type to another | ||
| pub fn try_cast_literal_to_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost seems like it is / should be a method on ScalarValue like ScalarValue::try_cast or something -- might make it more discoverable
We could do this as a follow on PR -- I see you just moved this to a new place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! There's already a ScalarValue::cast_to which may do what we want but I'm hesitant to change it in this PR since that might have unintended consequences if the implementations differ. I'd rather do that in it's own PR. I opened #16635 to track.
| Arc::new(BinaryExpr::new(cast_expr, Operator::Gt, literal_expr)); | ||
|
|
||
| // Apply unwrap cast optimization | ||
| let result = unwrap_cast_in_comparison(binary_expr, &schema).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a a minor comment it would be nice if we could reduce the boiler plate in these tests -- specifically, you could perhaps factor out the unwrapping and verifying that it was transformed
| // Create: cast(c1 as INT64) > INT64(10) | ||
| let column_expr = col("c1", &schema).unwrap(); | ||
| let cast_expr = Arc::new(CastExpr::new(column_expr, DataType::Int64, None)); | ||
| let literal_expr = lit(ScalarValue::Int64(Some(10))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can write this (and similar expressions) below much more sucinctly like this:
| let literal_expr = lit(ScalarValue::Int64(Some(10))); | |
| let literal_expr = lit(10i64); |
| fn test_unwrap_cast_with_literal_on_left() { | ||
| let schema = test_schema(); | ||
|
|
||
| // Create: INT64(10) < cast(c1 as INT64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also please test something where the cast doesn't work? Something with a column that is uint8 for example, and the literal can not fit into the column: `Int64(-5) < CAST(c4 as Int64)
Though I realize in this case the predicate would always be true then 🤔
|
@alamb thank you for the review. I think I've addressed all of the feedback 🙏🏻 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reran all the tests again and gave this a final once over and I think it looks great. Thanks again @adriangb
|
Thanks @alamb! |
Sets us up to address #16004.
I've ported part of the code to a new shared module and reproduced the non-shared functionality for PhysicalExpr, also setting up building blocks to be able to add more simplifications / optimizations (the only one I can think of though is const evaluation and that doesn't seem like a big deal).