-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add optimization rules for bitwise operations #5423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @izveigor -- this PR looks quite nice. The code is clean and well commented 🏅 and easy to read
I had a few comments but otherwise I think this PR is close to ready
@@ -639,6 +639,31 @@ impl Expr { | |||
binary_expr(self, Operator::Or, other) | |||
} | |||
|
|||
/// Return `self & other` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// A & 0 -> 0 | ||
Expr::BinaryExpr(BinaryExpr { | ||
left: _, | ||
op: BitwiseAnd, | ||
right, | ||
}) if is_zero(&right) => *right, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this rule also has to check that A
is non nullable as null & 0
is null
(not 0
, which is what this code does)
postgres=# select NULL & 0 ;
?column?
----------
(1 row)
The same comment applies to 0 & A
as well
@@ -77,6 +78,80 @@ pub fn expr_contains(expr: &Expr, needle: &Expr, search_op: Operator) -> bool { | |||
} | |||
} | |||
|
|||
/// This enum is needed for the function "delete_expressions_in_complex_expr" for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This enum is needed for the function "delete_expressions_in_complex_expr" for | |
/// This enum is needed for the function "delete_xor_in_complex_expr" for |
|
||
#[test] | ||
fn test_simplify_composed_bitwise_and() { | ||
// ((c > 5) & (c1 < 6)) & (c > 5) --> ((c > 5) & c1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should refer to c1 and c2 (rather than c and c1)
// ((c2 > 5) & (c1 < 6)) & (c2 > 5) --> ((c2 > 5) & c1)
It might be valuable to add a test for the other grouping as well:
// (c > 5) & ((c1 < 6) & (c > 5)) --> c1 & (c > 5)
left, | ||
op: BitwiseOr, | ||
right, | ||
}) if is_zero(&right) => *left, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here -- I think it only applies if right
is not NULL
op: BitwiseAnd, | ||
right, | ||
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => { | ||
Expr::Literal(ScalarValue::Int32(Some(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this should not always be ScalarValue::Int32
-- if left
and right
are DataType::Int64
for example, I think this should be ScalarValue::Int64(Some(0))
. This applies to the other rules as well.
Perhaps we could add a method like ScalarValue::new_zero(data_type: DataType)
to create such a constant
op: BitwiseOr, | ||
right, | ||
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => { | ||
Expr::Literal(ScalarValue::Int32(Some(-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise I think can't always use ScalarValue::Int32
-- it needs to use the appropriate variant of ScalarValue
for the type of left/right
op: BitwiseXor, | ||
right, | ||
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => { | ||
Expr::Literal(ScalarValue::Int32(Some(-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs the right ScalarValue
variant
op: BitwiseXor, | ||
right, | ||
}) if is_negative_of(&right, &left) && !info.nullable(&left)? => { | ||
Expr::Literal(ScalarValue::Int32(Some(-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs the right ScalarValue
variant
Hello, @alamb! |
I plan to review this PR tomorrow |
} | ||
|
||
/// Returns a new int type based on data type of an expression | ||
pub fn new_int_by_expr_data_type(val: i64, expr_data_type: &DataType) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of the function is affected by the following PR: #5476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to be pub?
@@ -123,6 +126,18 @@ impl<'a> SimplifyInfo for SimplifyContext<'a> { | |||
}) | |||
} | |||
|
|||
/// Returns data type of this expr needed for determining optimized int type of a value | |||
fn get_data_type(&self, expr: &Expr) -> DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Int32 be the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should simply return the error if the data type can not be determined -- it seems wrong to treat it as a 32 bit number 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This s looking very good @izveigor . Thank you. The test coverage is especially thorough. I had a few style comments/ suggestions but all I think is needed prior to merge is:
- address https://github.com/apache/arrow-datafusion/pull/5423/files#r1120773109
- Don't ignore errors in
get_datatype
Nice to have:
- use ScalarValue::zero / consolidate
new_int_by_expr_data_type
in ScalarValue
@@ -123,6 +126,18 @@ impl<'a> SimplifyInfo for SimplifyContext<'a> { | |||
}) | |||
} | |||
|
|||
/// Returns data type of this expr needed for determining optimized int type of a value | |||
fn get_data_type(&self, expr: &Expr) -> DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should simply return the error if the data type can not be determined -- it seems wrong to treat it as a 32 bit number 🤔 .
} | ||
|
||
/// Returns a new int type based on data type of an expression | ||
pub fn new_int_by_expr_data_type(val: i64, expr_data_type: &DataType) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to be pub?
op: BitwiseAnd, | ||
right, | ||
}) if is_negative_of(&left, &right) && !info.nullable(&right)? => { | ||
new_int_by_expr_data_type(0, &info.get_data_type(&left)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ScalarValue::new_zero
might be better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could perhaps add a either new_negative_one
or new_int(val: i64, data_type: &DataType) -> Result<ScalarValue>
to handle the -1 case
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
@@ -77,6 +78,61 @@ pub fn expr_contains(expr: &Expr, needle: &Expr, search_op: Operator) -> bool { | |||
} | |||
} | |||
|
|||
/// Deletes all 'needles' or remains one 'needle' that are found in a chain of xor | |||
/// expressions. Such as: A ^ (A ^ (B ^ A)) | |||
pub fn delete_xor_in_complex_expr(expr: &Expr, needle: &Expr, is_left: bool) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you could implement this using ExprRewriter
: https://github.com/apache/arrow-datafusion/blob/e9852074bacd8c891d84eba38b3417aa16a2d18c/datafusion/expr/src/utils.rs#L519
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @izveigor -- I think this looks very nice. Thanks for sticking with it.
@@ -49,6 +49,13 @@ impl SimplifyInfo for MyInfo { | |||
fn execution_props(&self) -> &ExecutionProps { | |||
&self.execution_props | |||
} | |||
|
|||
fn get_data_type(&self, expr: &Expr) -> Result<DataType> { | |||
match expr.get_type(&self.schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be expressed slightly more concisely -- see #5510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this code is redundant.
Closes #5422