Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Jun 24, 2025

Sets us up to address #16004.

I've ported part of the code to a new shared module and reproduced the non-shared functionality for PhysicalExpr, also setting up building blocks to be able to add more simplifications / optimizations (the only one I can think of though is const evaluation and that doesn't seem like a big deal).

@alamb
Copy link
Contributor

alamb commented Jun 24, 2025

I will try and find time to review this tomorrow. Thank you @adriangb -- I can't keep up !

@adriangb adriangb force-pushed the physical-optimizer branch from 5c7d5c8 to 1032b5d Compare June 26, 2025 14:05
@github-actions github-actions bot added the datasource Changes to the datasource crate label Jun 26, 2025
@adriangb
Copy link
Contributor Author

1032b5d 👨🏻‍🍳

@adriangb
Copy link
Contributor Author

One unfortunate thing: I can't add the optimizer to datafusion-pruning because datafusion-physical-optimizer depends on datafusion-pruning so it creates a cycle. I want to apply the simplifier inside of PruningPredicate to address #16004 (comment)

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jun 26, 2025
@adriangb
Copy link
Contributor Author

One unfortunate thing: I can't add the optimizer to datafusion-pruning because datafusion-physical-optimizer depends on datafusion-pruning so it creates a cycle. I want to apply the simplifier inside of PruningPredicate to address #16004 (comment)

Fixed by moving this to physical-expr/src/simplifier/

@adriangb adriangb force-pushed the physical-optimizer branch from 3dd706f to 6b2398a Compare June 27, 2025 20:45
@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

I am sorry for the delay reviewing this, but there is a lot going on at the moment. We have quite a few PRs merging in main.

I am thinking maybe next week we should target slowing down the new features and start preparing for hardening / bug fixing for the 49 release...

@adriangb
Copy link
Contributor Author

Sounds good to me! This can easily wait until after next week.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb -- I think this PR is a really nice step forward.
I left some suggestions but I don't think any are necessary.

It is so cool to see this progressing towards having a good story for per-file optimized predicates. 🚀

BTW I wonder if we should use this simplifier in the FilterPushdownSimplifier @xudong963 added a few days ago in : #16362

// specific language governing permissions and limitations
// under the License.

//! Utilities for casting scalar literals to different data types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

/// Swap comparison operators for right-side cast unwrapping
fn swap_operator(op: Operator) -> Operator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use Operator::swap, right? I don't think there is any reason to have it specially here

extract_cast_info(binary.right()),
) {
// For literal op cast(expr), we need to swap the operator
let swapped_op = swap_operator(*binary.op());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't swap the operator I think the rewrite should stop (not just unwrap the comparison)

use datafusion_common::ScalarValue;

/// Convert a literal value from one data type to another
pub fn try_cast_literal_to_type(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost seems like it is / should be a method on ScalarValue like ScalarValue::try_cast or something -- might make it more discoverable

We could do this as a follow on PR -- I see you just moved this to a new place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! There's already a ScalarValue::cast_to which may do what we want but I'm hesitant to change it in this PR since that might have unintended consequences if the implementations differ. I'd rather do that in it's own PR. I opened #16635 to track.

Arc::new(BinaryExpr::new(cast_expr, Operator::Gt, literal_expr));

// Apply unwrap cast optimization
let result = unwrap_cast_in_comparison(binary_expr, &schema).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a a minor comment it would be nice if we could reduce the boiler plate in these tests -- specifically, you could perhaps factor out the unwrapping and verifying that it was transformed

// Create: cast(c1 as INT64) > INT64(10)
let column_expr = col("c1", &schema).unwrap();
let cast_expr = Arc::new(CastExpr::new(column_expr, DataType::Int64, None));
let literal_expr = lit(ScalarValue::Int64(Some(10)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can write this (and similar expressions) below much more sucinctly like this:

Suggested change
let literal_expr = lit(ScalarValue::Int64(Some(10)));
let literal_expr = lit(10i64);

fn test_unwrap_cast_with_literal_on_left() {
let schema = test_schema();

// Create: INT64(10) < cast(c1 as INT64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please test something where the cast doesn't work? Something with a column that is uint8 for example, and the literal can not fit into the column: `Int64(-5) < CAST(c4 as Int64)

Though I realize in this case the predicate would always be true then 🤔

@adriangb
Copy link
Contributor Author

adriangb commented Jul 1, 2025

@alamb thank you for the review. I think I've addressed all of the feedback 🙏🏻

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reran all the tests again and gave this a final once over and I think it looks great. Thanks again @adriangb

@alamb alamb merged commit 6870cc1 into apache:main Jul 2, 2025
29 checks passed
@adriangb
Copy link
Contributor Author

adriangb commented Jul 2, 2025

Thanks @alamb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants