Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add optimization rules for bitwise operations #5423

Merged
merged 14 commits into from
Mar 7, 2023
Merged

feat: add optimization rules for bitwise operations #5423

merged 14 commits into from
Mar 7, 2023

Conversation

izveigor
Copy link
Contributor

Closes #5422

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Feb 27, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor -- this PR looks quite nice. The code is clean and well commented 🏅 and easy to read

I had a few comments but otherwise I think this PR is close to ready

@@ -639,6 +639,31 @@ impl Expr {
binary_expr(self, Operator::Or, other)
}

/// Return `self & other`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 722 to 727
// A & 0 -> 0
Expr::BinaryExpr(BinaryExpr {
left: _,
op: BitwiseAnd,
right,
}) if is_zero(&right) => *right,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this rule also has to check that A is non nullable as null & 0 is null (not 0, which is what this code does)

postgres=# select NULL & 0 ;
 ?column?
----------

(1 row)

The same comment applies to 0 & A as well

@@ -77,6 +78,80 @@ pub fn expr_contains(expr: &Expr, needle: &Expr, search_op: Operator) -> bool {
}
}

/// This enum is needed for the function "delete_expressions_in_complex_expr" for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This enum is needed for the function "delete_expressions_in_complex_expr" for
/// This enum is needed for the function "delete_xor_in_complex_expr" for


#[test]
fn test_simplify_composed_bitwise_and() {
// ((c > 5) & (c1 < 6)) & (c > 5) --> ((c > 5) & c1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should refer to c1 and c2 (rather than c and c1)

        // ((c2 > 5) & (c1 < 6)) & (c2 > 5) --> ((c2 > 5) & c1)

It might be valuable to add a test for the other grouping as well:

        // (c > 5) & ((c1 < 6) & (c > 5)) --> c1 & (c > 5)

left,
op: BitwiseOr,
right,
}) if is_zero(&right) => *left,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here -- I think it only applies if right is not NULL

op: BitwiseAnd,
right,
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => {
Expr::Literal(ScalarValue::Int32(Some(0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should not always be ScalarValue::Int32 -- if left and right are DataType::Int64 for example, I think this should be ScalarValue::Int64(Some(0)). This applies to the other rules as well.

Perhaps we could add a method like ScalarValue::new_zero(data_type: DataType) to create such a constant

op: BitwiseOr,
right,
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => {
Expr::Literal(ScalarValue::Int32(Some(-1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise I think can't always use ScalarValue::Int32 -- it needs to use the appropriate variant of ScalarValue for the type of left/right

op: BitwiseXor,
right,
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => {
Expr::Literal(ScalarValue::Int32(Some(-1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs the right ScalarValue variant

op: BitwiseXor,
right,
}) if is_negative_of(&right, &left) && !info.nullable(&left)? => {
Expr::Literal(ScalarValue::Int32(Some(-1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs the right ScalarValue variant

@github-actions github-actions bot added the core Core DataFusion crate label Mar 2, 2023
@izveigor
Copy link
Contributor Author

izveigor commented Mar 2, 2023

Hello, @alamb!
Thanks to your response, I took into account your comments and fixed all defects. I hope this PR will be merged into main soon.

@alamb
Copy link
Contributor

alamb commented Mar 3, 2023

I plan to review this PR tomorrow

}

/// Returns a new int type based on data type of an expression
pub fn new_int_by_expr_data_type(val: i64, expr_data_type: &DataType) -> Expr {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of the function is affected by the following PR: #5476

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be pub?

@@ -123,6 +126,18 @@ impl<'a> SimplifyInfo for SimplifyContext<'a> {
})
}

/// Returns data type of this expr needed for determining optimized int type of a value
fn get_data_type(&self, expr: &Expr) -> DataType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Int32 be the default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should simply return the error if the data type can not be determined -- it seems wrong to treat it as a 32 bit number 🤔 .

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This s looking very good @izveigor . Thank you. The test coverage is especially thorough. I had a few style comments/ suggestions but all I think is needed prior to merge is:

  1. address https://github.com/apache/arrow-datafusion/pull/5423/files#r1120773109
  2. Don't ignore errors in get_datatype

Nice to have:

  • use ScalarValue::zero / consolidate new_int_by_expr_data_type in ScalarValue

@@ -123,6 +126,18 @@ impl<'a> SimplifyInfo for SimplifyContext<'a> {
})
}

/// Returns data type of this expr needed for determining optimized int type of a value
fn get_data_type(&self, expr: &Expr) -> DataType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should simply return the error if the data type can not be determined -- it seems wrong to treat it as a 32 bit number 🤔 .

}

/// Returns a new int type based on data type of an expression
pub fn new_int_by_expr_data_type(val: i64, expr_data_type: &DataType) -> Expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be pub?

op: BitwiseAnd,
right,
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => {
new_int_by_expr_data_type(0, &info.get_data_type(&left))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could perhaps add a either new_negative_one or new_int(val: i64, data_type: &DataType) -> Result<ScalarValue> to handle the -1 case

@@ -77,6 +78,61 @@ pub fn expr_contains(expr: &Expr, needle: &Expr, search_op: Operator) -> bool {
}
}

/// Deletes all 'needles' or remains one 'needle' that are found in a chain of xor
/// expressions. Such as: A ^ (A ^ (B ^ A))
pub fn delete_xor_in_complex_expr(expr: &Expr, needle: &Expr, is_left: bool) -> Expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor -- I think this looks very nice. Thanks for sticking with it.

@alamb alamb merged commit d7b2212 into apache:main Mar 7, 2023
@alamb alamb mentioned this pull request Mar 7, 2023
@@ -49,6 +49,13 @@ impl SimplifyInfo for MyInfo {
fn execution_props(&self) -> &ExecutionProps {
&self.execution_props
}

fn get_data_type(&self, expr: &Expr) -> Result<DataType> {
match expr.get_type(&self.schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be expressed slightly more concisely -- see #5510

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this code is redundant.

@andygrove andygrove added the enhancement New feature or request label Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate enhancement New feature or request logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: add optimization rules for bitwise operations
3 participants