Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tests for simplify and Simplifer consistent #1376

Merged
merged 6 commits into from
Dec 7, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 28, 2021

Which issue does this PR close?

Builds on #1374 and #1375

Re #1160 (more sophisticated expression simplification)

Rationale for this change

There is redundant code in simplify and Simplifier which makes it hard to know what is simplified and what is not. It is also hard to work with the tests in this file

What changes are included in this PR?

  1. Refactor the tests for simplify and Simplifier to be a consistent style (so that when I combine the logic of the two it will be clear that behavior has not changed)

Are there any user-facing changes?

No

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Nov 28, 2021
@alamb alamb changed the title (WIP) Move ConstEvaluator into simplity_expressions.rs (WIP) Move ConstEvaluator into simplify_expressions.rs Nov 28, 2021
@alamb alamb changed the title (WIP) Move ConstEvaluator into simplify_expressions.rs (WIP) Make tests for simplify and Simplifer consistent Nov 28, 2021
@alamb alamb force-pushed the alamb/simplify_simplify2 branch from 22be605 to a029d37 Compare December 2, 2021 22:45
@alamb alamb force-pushed the alamb/simplify_simplify2 branch from a029d37 to 370f8ca Compare December 2, 2021 22:52
@alamb alamb marked this pull request as ready for review December 4, 2021 17:37
@alamb alamb changed the title (WIP) Make tests for simplify and Simplifer consistent Make tests for simplify and Simplifer consistent Dec 4, 2021
@alamb alamb requested review from andygrove, houqp and jimexist December 4, 2021 17:39
@alamb
Copy link
Contributor Author

alamb commented Dec 4, 2021

This PR is now ready for review 🙏

Copy link
Contributor

@capkurmagati capkurmagati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks much cleaner.
Left several nit comments.


Ok(())
fn lit_true() -> Expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder you intended to use lit_true() and lit(true) in different situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is a great point. I will change to use lit(true) and lit(false) -- that is much better I think

//
// Make sure c1 column to be used in tests is not boolean type
assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8);
assert_eq!(col("c1").get_type(&schema).unwrap(), DataType::Utf8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a test for nonboolean type for logical plan? Something like

  #[test]
    fn test_simplity_skip_nonboolean_type() {
        let table_scan = test_table_scan();
        let plan = LogicalPlanBuilder::from(table_scan)
            .filter(col("d").eq(lit(false)).not())
            .unwrap()
            .project(vec![col("a")])
            .unwrap()
            .build()
            .unwrap();

        let expected = "\
        Projection: #test.a\
        \n  Filter: NOT #test.d = Boolean(false)\
        \n    TableScan: test projection=None";

        assert_optimized_plan_eq(&plan, expected);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that plan would error at runtime (because "d" is defined to be DataTypre::UInt32 and if you try to do UInt32 != Bool you get an error:

❯ select 1 != true;
Plan("'Int64 != Boolean' can't be evaluated because there isn't a common type to coerce the types to")

Though it does work if we have an explicit CAST:

❯ select 1 != cast(true as int);
+------------------------------------------+
| Int64(1) != CAST(Boolean(true) AS Int32) |
+------------------------------------------+
| false                                    |
+------------------------------------------+
1 row in set. Query took 0.003 seconds.

I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself

Thanks for the explanation. Makes sense to me.

col("c2").not(),
);
// TODO rename to simplify
fn do_simplify(expr: Expr) -> Expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 and looks ready to rename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have plans to rename it in a follow on PR (b/c I want to remove simpify) -- you can see what I have in mind here: #1401

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work ❤️ @alamb. I reviewed it carefully and left some of my suggestions. The overall code is definitely cleaner and makes sense to refactor!

datafusion/src/optimizer/simplify_expressions.rs Outdated Show resolved Hide resolved
let expr_a = binary_expr(col("c"), Operator::Multiply, lit(1));
let expr_b = binary_expr(lit(1), Operator::Multiply, col("c"));
let expected = col("c");
fn test_simplify_multiply_by_one() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can also add test_simlify_multiply_by_zero -> 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, and it turns out that rewrite rule is not implemented LOL 👍

diff --git a/datafusion/src/optimizer/simplify_expressions.rs b/datafusion/src/optimizer/simplify_expressions.rs
index 3c4af838b..68d45c446 100644
--- a/datafusion/src/optimizer/simplify_expressions.rs
+++ b/datafusion/src/optimizer/simplify_expressions.rs
@@ -851,6 +851,16 @@ mod tests {
         assert_eq!(simplify(&expr_b), expected);
     }
 
+    #[test]
+    fn test_simplify_multiply_by_zero() {
+        let expr_a = binary_expr(col("c2"), Operator::Multiply, lit(0));
+        let expr_b = binary_expr(lit(0), Operator::Multiply, col("c2"));
+        let expected = lit(0);
+
+        assert_eq!(simplify(&expr_a), expected);
+        assert_eq!(simplify(&expr_b), expected);
+    }
+
     #[test]
     fn test_simplify_divide_by_one() {
         let expr = binary_expr(col("c2"), Operator::Divide, lit(1));

Resulted in

---- optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero stdout ----
thread 'optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero' panicked at 'assertion failed: `(left == right)`
  left: `#c2 * Int32(0)`,
 right: `Int32(0)`', datafusion/src/optimizer/simplify_expressions.rs:860:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #1406

}

#[test]
fn test_simplify_negated_and() -> Result<()> {
fn test_simplify_negated_and() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't be simplified to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no theoretical reason. I think this test is to ensure a particular corner case of how the rewrite rule for A and (B and A) is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #1406

fn test_simplify_and_and_false() -> Result<()> {
let expr =
binary_expr(lit(ScalarValue::Boolean(None)), Operator::And, lit(false));
fn test_simplify_and_and_false() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be test_simplify_null_and_false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good 👀 👍

.project(vec![col("a")])?
.build()?;
.filter(col("b").not_eq(lit(true)))
.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you replaced ? with unwrap. Why, just curious. Both ways make sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps me more easily find the source of the problem -- by using unwap() whenever an error happens, you can set RUST_BACKTRACE=1 and know exactly at what site the problem happened. If the tests return Error, often the Error does not have sufficient context to know exactly where it was generated.

Co-authored-by: Carlos <wxd963996380@gmail.com>
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@alamb
Copy link
Contributor Author

alamb commented Dec 7, 2021

Thanks for the careful review @xudong963 and @capkurmagati

@alamb alamb merged commit 32e24d2 into apache:master Dec 7, 2021
@alamb alamb deleted the alamb/simplify_simplify2 branch December 7, 2021 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants