Make expression manipulation consistent and easier to use: `combine/split filter` `conjunction`, etc #3810

alamb · 2022-10-12T14:36:15Z

~~DRAFT as it builds on #3809~~

Which issue does this PR close?

Rationale for this change

The APIs for manipulating expressions were all over the map (sometimes return Vec, sometimes taking them as mut, owned, etc) as well as being inconsistently named and inconsistently tested.

In fact I couldn't find uncombine_filter (which is similar, but not the same as split_conjunction)

What changes are included in this PR?

Change the APIs to be consistently named and take reasonable arguments

Change split_conjunction to return a Vec thus simplifying the API
Rename split_filter to split_conjunction_owned to make it clear how it is related to split_conjunction
Expand test coverage
Rename combine_filter to conjunction
Rename combine_filter_disjunction to disjunction
Change APIs for conjunction and disjunction to avoid clones

I will highlight the changes inline.

Are there any user-facing changes?

If anyone uses these APIs they will need to change them, but hopfully

alamb · 2022-10-12T14:43:08Z

benchmarks/src/bin/tpch.rs

@@ -766,7 +766,8 @@ mod tests {
            if !actual.is_empty() {
                actual += "\n";
            }
-            actual += &format!("{}", plan.display_indent());
+            use std::fmt::Write as _;


clippy told me to do this

alamb · 2022-10-12T14:43:49Z

datafusion/optimizer/src/filter_push_down.rs

@@ -341,8 +341,7 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
        }
        LogicalPlan::Analyze { .. } => push_down(&state, plan),
        LogicalPlan::Filter(Filter { input, predicate }) => {
-            let mut predicates = vec![];
-            utils::split_conjunction(predicate, &mut predicates);
+            let predicates = utils::split_conjunction(predicate);


here is an example where the split_conjuntion API is easier to use now

alamb · 2022-10-12T14:44:12Z

datafusion/optimizer/src/decorrelate_where_in.rs

@@ -175,7 +173,7 @@ fn optimize_where_in(
    // build subquery side of join - the thing the subquery was querying
    let subqry_alias = format!("__sq_{}", optimizer_config.next_id());
    let mut subqry_plan = LogicalPlanBuilder::from((*subqry_input).clone());
-    if let Some(expr) = combine_filters(&other_subqry_exprs) {
+    if let Some(expr) = conjunction(other_subqry_exprs) {


here is an example of less copying (can use other_subqry_exprsdirectly)

alamb · 2022-10-12T19:01:29Z

datafusion/optimizer/src/utils.rs

+/// Splits an owned conjunctive [`Expr`] such as `A AND B AND C` => `[A, B, C]`
+///
+/// See [`split_conjunction`] for more details.
+pub fn split_conjunction_owned(expr: Expr) -> Vec<Expr> {


this function used to be called uncombine_filters which was not at all consistent with the split_conjunction that took a reference 🤯

…e conjunction/disjunction and reduce clone

jackwener

Great job👍

jackwener · 2022-10-13T12:45:45Z

datafusion/optimizer/src/utils.rs

        }
-        Expr::Alias(expr, _) => {
-            split_conjunction(expr, predicates);
+        Expr::Alias(expr, _) => split_conjunction_impl(expr, exprs),


jackwener · 2022-10-13T12:57:13Z

datafusion/optimizer/src/utils.rs

-    fn combine_zero_filters() {
-        let result = combine_filters(&[]);
-        assert_eq!(result, None);
+    fn test_split_conjunction() {


IMO, I think we can add test for conjunction(). At the same time, we can check the tree structure of this expression by using match.

expr is (A B) C; /// using `match` to check. match expr { And( And( B, C ), C ) }

[A, B, C ,D , E] -> (((A B) C) (D E)) is different from ((((A B) C) D) E).

we can see the result of conjunction() in the UT, ensure the result of tree structure of conjunction()

This was a great idea @jackwener -- thank you. I added this test 09f3cf4

As part of writing tests, I also found that the API for disjunction was slightly different than conjunction so I made them the same as well.

ursabot · 2022-10-15T12:02:11Z

Benchmark runs are scheduled for baseline = 0b90a8a and contender = fc5081d. fc5081d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules labels Oct 12, 2022

alamb commented Oct 12, 2022

View reviewed changes

alamb force-pushed the alamb/better_expr_api branch 2 times, most recently from ae47468 to 6e45c74 Compare October 12, 2022 18:59

alamb commented Oct 12, 2022

View reviewed changes

github-actions bot removed the logical-expr Logical plan and expressions label Oct 12, 2022

Improve split_conjunction API, combine split_disjunction_owned, renam…

0dd3db4

…e conjunction/disjunction and reduce clone

alamb force-pushed the alamb/better_expr_api branch from 6e45c74 to 0dd3db4 Compare October 12, 2022 19:34

alamb marked this pull request as ready for review October 12, 2022 19:34

alamb changed the title ~~Improve the ergonomics of expression manipulation: combine/split filter conjunction, etc~~ Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc Oct 12, 2022

jackwener approved these changes Oct 13, 2022

View reviewed changes

alamb added 2 commits October 15, 2022 06:37

Merge remote-tracking branch 'apache/master' into alamb/better_expr_api

34155f8

Add tests for conjunction/disjunction, make API uniform

09f3cf4

alamb merged commit fc5081d into apache:master Oct 15, 2022

alamb deleted the alamb/better_expr_api branch October 15, 2022 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make expression manipulation consistent and easier to use: `combine/split filter` `conjunction`, etc #3810

Make expression manipulation consistent and easier to use: `combine/split filter` `conjunction`, etc #3810

alamb commented Oct 12, 2022 •

edited

Loading

alamb Oct 12, 2022

alamb Oct 12, 2022

alamb Oct 12, 2022

alamb Oct 12, 2022

jackwener left a comment

jackwener Oct 13, 2022

jackwener Oct 13, 2022 •

edited

Loading

alamb Oct 15, 2022

ursabot commented Oct 15, 2022

Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc #3810

Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc #3810

Conversation

alamb commented Oct 12, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb Oct 12, 2022

Choose a reason for hiding this comment

alamb Oct 12, 2022

Choose a reason for hiding this comment

alamb Oct 12, 2022

Choose a reason for hiding this comment

alamb Oct 12, 2022

Choose a reason for hiding this comment

jackwener left a comment

Choose a reason for hiding this comment

jackwener Oct 13, 2022

Choose a reason for hiding this comment

jackwener Oct 13, 2022 • edited Loading

Choose a reason for hiding this comment

alamb Oct 15, 2022

Choose a reason for hiding this comment

ursabot commented Oct 15, 2022

Make expression manipulation consistent and easier to use: `combine/split filter` `conjunction`, etc #3810

Make expression manipulation consistent and easier to use: `combine/split filter` `conjunction`, etc #3810

alamb commented Oct 12, 2022 •

edited

Loading

jackwener Oct 13, 2022 •

edited

Loading