-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make tests for simplify
and Simplifer
consistent
#1376
Conversation
simplify
and Simplifer
consistent
22be605
to
a029d37
Compare
a029d37
to
370f8ca
Compare
simplify
and Simplifer
consistentsimplify
and Simplifer
consistent
This PR is now ready for review 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks much cleaner.
Left several nit comments.
|
||
Ok(()) | ||
fn lit_true() -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder you intended to use lit_true()
and lit(true)
in different situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is a great point. I will change to use lit(true)
and lit(false)
-- that is much better I think
// | ||
// Make sure c1 column to be used in tests is not boolean type | ||
assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8); | ||
assert_eq!(col("c1").get_type(&schema).unwrap(), DataType::Utf8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a test for nonboolean type for logical plan? Something like
#[test]
fn test_simplity_skip_nonboolean_type() {
let table_scan = test_table_scan();
let plan = LogicalPlanBuilder::from(table_scan)
.filter(col("d").eq(lit(false)).not())
.unwrap()
.project(vec![col("a")])
.unwrap()
.build()
.unwrap();
let expected = "\
Projection: #test.a\
\n Filter: NOT #test.d = Boolean(false)\
\n TableScan: test projection=None";
assert_optimized_plan_eq(&plan, expected);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think that plan would error at runtime (because "d"
is defined to be DataTypre::UInt32
and if you try to do UInt32 != Bool
you get an error:
❯ select 1 != true;
Plan("'Int64 != Boolean' can't be evaluated because there isn't a common type to coerce the types to")
Though it does work if we have an explicit CAST
:
❯ select 1 != cast(true as int);
+------------------------------------------+
| Int64(1) != CAST(Boolean(true) AS Int32) |
+------------------------------------------+
| false |
+------------------------------------------+
1 row in set. Query took 0.003 seconds.
I guess I was thinking the plan tests are for ensuring that simplify
is being called correctly on the expressions in the plan, rather than testing the various Expr
corner cases of the simplification logic itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself
Thanks for the explanation. Makes sense to me.
col("c2").not(), | ||
); | ||
// TODO rename to simplify | ||
fn do_simplify(expr: Expr) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 and looks ready to rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have plans to rename it in a follow on PR (b/c I want to remove simpify
) -- you can see what I have in mind here: #1401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work ❤️ @alamb. I reviewed it carefully and left some of my suggestions. The overall code is definitely cleaner and makes sense to refactor!
let expr_a = binary_expr(col("c"), Operator::Multiply, lit(1)); | ||
let expr_b = binary_expr(lit(1), Operator::Multiply, col("c")); | ||
let expected = col("c"); | ||
fn test_simplify_multiply_by_one() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can also add test_simlify_multiply_by_zero
-> 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, and it turns out that rewrite rule is not implemented LOL 👍
diff --git a/datafusion/src/optimizer/simplify_expressions.rs b/datafusion/src/optimizer/simplify_expressions.rs
index 3c4af838b..68d45c446 100644
--- a/datafusion/src/optimizer/simplify_expressions.rs
+++ b/datafusion/src/optimizer/simplify_expressions.rs
@@ -851,6 +851,16 @@ mod tests {
assert_eq!(simplify(&expr_b), expected);
}
+ #[test]
+ fn test_simplify_multiply_by_zero() {
+ let expr_a = binary_expr(col("c2"), Operator::Multiply, lit(0));
+ let expr_b = binary_expr(lit(0), Operator::Multiply, col("c2"));
+ let expected = lit(0);
+
+ assert_eq!(simplify(&expr_a), expected);
+ assert_eq!(simplify(&expr_b), expected);
+ }
+
#[test]
fn test_simplify_divide_by_one() {
let expr = binary_expr(col("c2"), Operator::Divide, lit(1));
Resulted in
---- optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero stdout ----
thread 'optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero' panicked at 'assertion failed: `(left == right)`
left: `#c2 * Int32(0)`,
right: `Int32(0)`', datafusion/src/optimizer/simplify_expressions.rs:860:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking in #1406
} | ||
|
||
#[test] | ||
fn test_simplify_negated_and() -> Result<()> { | ||
fn test_simplify_negated_and() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't be simplified to false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no theoretical reason. I think this test is to ensure a particular corner case of how the rewrite rule for A and (B and A)
is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking in #1406
fn test_simplify_and_and_false() -> Result<()> { | ||
let expr = | ||
binary_expr(lit(ScalarValue::Boolean(None)), Operator::And, lit(false)); | ||
fn test_simplify_and_and_false() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be test_simplify_null_and_false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good 👀 👍
.project(vec![col("a")])? | ||
.build()?; | ||
.filter(col("b").not_eq(lit(true))) | ||
.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed you replaced ?
with unwrap
. Why, just curious. Both ways make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps me more easily find the source of the problem -- by using unwap()
whenever an error happens, you can set RUST_BACKTRACE=1
and know exactly at what site the problem happened. If the tests return Error
, often the Error
does not have sufficient context to know exactly where it was generated.
Co-authored-by: Carlos <wxd963996380@gmail.com>
…afusion into alamb/simplify_simplify2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks for the careful review @xudong963 and @capkurmagati |
Which issue does this PR close?
Builds on #1374 and #1375Re #1160 (more sophisticated expression simplification)
Rationale for this change
There is redundant code in
simplify
andSimplifier
which makes it hard to know what is simplified and what is not. It is also hard to work with the tests in this fileWhat changes are included in this PR?
simplify
andSimplifier
to be a consistent style (so that when I combine the logic of the two it will be clear that behavior has not changed)Are there any user-facing changes?
No