Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushdown single column predicates from ON join clauses #3578

Merged
merged 6 commits into from
Oct 15, 2022

Conversation

HuSen8891
Copy link
Contributor

@HuSen8891 HuSen8891 commented Sep 21, 2022

Which issue does this PR close?

Closes #3577

Rationale for this change

optimize join's performance when join has OR clauses in filter or on clause.

examine any OR clauses of join to see if any useful clauses can be extracted and push down to join's rel to filter more rows before join.

for TPCH q19

the logical plan before optimized

    "Explain [plan_type:Utf8, plan:Utf8]",
    "  Projection: #lineitem.l_partkey [l_partkey:Int64]",
    "    Projection: #part.p_size >= Int32(1) AS #part.p_size >= Int32(1)Int32(1)#part.p_size, #lineitem.l_partkey, #lineitem.l_quantity, #part.p_brand, #part.p_size [#part.p_size >= Int32(1)Int32(1)#part.p_size:Boolean;N, l_partkey:Int64, l_quantity:Decimal128(15, 2), p_brand:Utf8, p_size:Int32]",
    "      Filter: #part.p_brand = Utf8(\"Brand#12\") AND #lineitem.l_quantity >= Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= Decimal128(Some(1100),15,2) AND #part.p_size <= Int32(5) OR #part.p_brand = Utf8(\"Brand#23\") AND #lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(2000),15,2) AND #part.p_size <= Int32(10) OR #part.p_brand = Utf8(\"Brand#34\") AND #lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(3000),15,2) AND #part.p_size <= Int32(15) [l_partkey:Int64, l_quantity:Decimal128(15, 2), p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
    "        Inner Join: #lineitem.l_partkey = #part.p_partkey [l_partkey:Int64, l_quantity:Decimal128(15, 2), p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
    "          TableScan: lineitem projection=[l_partkey, l_quantity] [l_partkey:Int64, l_quantity:Decimal128(15, 2)]",
    "          Filter: #part.p_size >= Int32(1) [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
    "            TableScan: part projection=[p_partkey, p_brand, p_size], partial_filters=[#part.p_size >= Int32(1)] [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",

the logical plan after optimized

    "Explain [plan_type:Utf8, plan:Utf8]",
    "  Projection: #lineitem.l_partkey [l_partkey:Int64]",
    "    Projection: #part.p_size >= Int32(1) AS #part.p_size >= Int32(1)Int32(1)#part.p_size, #lineitem.l_partkey, #lineitem.l_quantity, #part.p_brand, #part.p_size [#part.p_size >= Int32(1)Int32(1)#part.p_size:Boolean;N, l_partkey:Int64, l_quantity:Decimal128(15, 2), p_brand:Utf8, p_size:Int32]",
    "      Filter: #part.p_brand = Utf8(\"Brand#12\") AND #lineitem.l_quantity >= Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= Decimal128(Some(1100),15,2) AND #part.p_size <= Int32(5) OR #part.p_brand = Utf8(\"Brand#23\") AND #lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(2000),15,2) AND #part.p_size <= Int32(10) OR #part.p_brand = Utf8(\"Brand#34\") AND #lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(3000),15,2) AND #part.p_size <= Int32(15) [l_partkey:Int64, l_quantity:Decimal128(15, 2), p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
    "        Inner Join: #lineitem.l_partkey = #part.p_partkey [l_partkey:Int64, l_quantity:Decimal128(15, 2), p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
    "          Filter: #lineitem.l_quantity >= Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= Decimal128(Some(1100),15,2) OR #lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(2000),15,2) OR #lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(3000),15,2) [l_partkey:Int64, l_quantity:Decimal128(15, 2)]",
    "            TableScan: lineitem projection=[l_partkey, l_quantity], partial_filters=[#lineitem.l_quantity >= Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= Decimal128(Some(1100),15,2) OR #lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(2000),15,2) OR #lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(3000),15,2)] [l_partkey:Int64, l_quantity:Decimal128(15, 2)]",
    "          Filter: #part.p_size >= Int32(1) AND #part.p_brand = Utf8(\"Brand#12\") AND #part.p_size <= Int32(5) OR #part.p_brand = Utf8(\"Brand#23\") AND #part.p_size <= Int32(10) OR #part.p_brand = Utf8(\"Brand#34\") AND #part.p_size <= Int32(15) [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
    "            TableScan: part projection=[p_partkey, p_brand, p_size], partial_filters=[#part.p_size >= Int32(1), #part.p_brand = Utf8(\"Brand#12\") AND #part.p_size <= Int32(5) OR #part.p_brand = Utf8(\"Brand#23\") AND #part.p_size <= Int32(10) OR #part.p_brand = Utf8(\"Brand#34\") AND #part.p_size <= Int32(15)] [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",

we extract new predicate and push down to join's rel, this predicate filters more rows before join , makes join more effective.

What changes are included in this PR?

add extract OR clasue in datafusion/optimizer/src/filter_push_down.rs.

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Sep 21, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #3578 (f65dd47) into master (0a2b0a7) will increase coverage by 0.00%.
The diff coverage is 98.16%.

@@           Coverage Diff           @@
##           master    #3578   +/-   ##
=======================================
  Coverage   85.92%   85.92%           
=======================================
  Files         301      300    -1     
  Lines       56249    56309   +60     
=======================================
+ Hits        48330    48383   +53     
- Misses       7919     7926    +7     
Impacted Files Coverage Δ
datafusion/core/src/execution/context.rs 79.31% <ø> (-0.03%) ⬇️
datafusion/core/tests/sql/joins.rs 99.33% <ø> (ø)
datafusion/core/tests/sql/predicates.rs 100.00% <ø> (ø)
datafusion/core/src/physical_plan/planner.rs 77.47% <90.90%> (+0.11%) ⬆️
datafusion/optimizer/src/filter_push_down.rs 97.76% <98.75%> (+0.09%) ⬆️
datafusion/core/src/physical_plan/sorts/sort.rs 93.86% <100.00%> (-0.60%) ⬇️
...e/src/physical_plan/sorts/sort_preserving_merge.rs 93.49% <0.00%> (-0.36%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.92% <0.00%> (-0.33%) ⬇️
datafusion/expr/src/binary_rule.rs 84.50% <0.00%> (-0.09%) ⬇️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @AssHero -- I need to think about this transformation carefully and will review the code in the next day or two

alamb
alamb previously requested changes Sep 24, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this transformation is correct, though I may be mistaken

@@ -248,6 +249,128 @@ fn get_pushable_join_predicates<'a>(
.unzip()
}

// examine OR clause to see if any useful clauses can be extracted and push down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this transformation is correct. In particular, I don't think the results will always be the same

Schematically, we have this type of predicate (that is being evaluated during the join)

(A AND B) OR (C AND D)

This transformation proposes adding another (A OR B) clause (evaluated before the join), so effectively

 ((A AND B) OR (C AND D)) AND (A OR B)

In order to do this transformation, the boolean statements must be equivalent for all inputs.

However, a counter example is

A: false, B: false, C: true, D: true

In this case, the original predicate would be true, but the rewrite would be false

Here is the program I wrote to generate the entire truth table: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=334938478775ba3cd55e7c400ea89b06

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation should extract at least one quals from each sub-clauses of OR, else do nothing.

(A AND B) OR (C AND D)

will be transformed to

((A AND B) OR (C AND D)) AND (A OR C)

OR

((A AND B) OR (C AND D)) AND ((A AND B) OR C)

OR

do nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- thanks -- I checked those rewrites and https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3b41b0409c8ecf4df0027f323668e0db they do look good to me

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AssHero -- this makes more sense now. I'll try and find some time to review it more carefully over the next day or two

Comment on lines 267 to 268
// TableScan: projection=[c, d]
fn extract_or_clauses_for_join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TableScan: projection=[c, d]
fn extract_or_clauses_for_join(
// TableScan: projection=[c, d]
//
// In general, predicates of this form:
//
// (A AND B) OR (C AND D)
//
// will be transformed to
//
// ((A AND B) OR (C AND D)) AND (A OR C)
//
// OR
//
// ((A AND B) OR (C AND D)) AND ((A AND B) OR C)
//
// OR
//
// do nothing.
//
fn extract_or_clauses_for_join(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this comments later. Thanks!

@@ -248,6 +249,128 @@ fn get_pushable_join_predicates<'a>(
.unzip()
}

// examine OR clause to see if any useful clauses can be extracted and push down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- thanks -- I checked those rewrites and https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3b41b0409c8ecf4df0027f323668e0db they do look good to me

@HuSen8891 HuSen8891 requested a review from alamb October 5, 2022 03:17
@alamb
Copy link
Contributor

alamb commented Oct 7, 2022

I am sorry for the late review here -- it is on my list. Basically I am struggling to find time enough to sit down and convince myself that this is a correct transformation in all cases (esp with outer joins and nullability), and then also that we want to do this kind of pushdown. I just haven't had the time yet.

@HuSen8891
Copy link
Contributor Author

I am sorry for the late review here -- it is on my list. Basically I am struggling to find time enough to sit down and convince myself that this is a correct transformation in all cases (esp with outer joins and nullability), and then also that we want to do this kind of pushdown. I just haven't had the time yet.

Thanks!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR again @AssHero and I apologize for the late review

I spent a while this afternoon thinking about the transformation it proposes and I have basically convinced myself that it is correct (and quite clever, I want to point out)

I think this optimization might actually slow some plans down (if the filters that are pushed down don't actually filter many rows, they will just consume CPU). I don't have a great suggestion about this

Given the potential for very subtle wrong results with this kind of optimization I am trying to be cautious. I would like to add some additional comments and tests (will point out where inline below) prior to merging this. Maybe we can even add a config option to disable it as a way to further derisk the process.

cc @Dandandan / @andygrove / @avantgardnerio / @xudong963 / @thinkharderdev in case you have some other thoughts.

" TableScan: lineitem projection=[l_partkey, l_quantity] [l_partkey:Int64, l_quantity:Decimal128(15, 2)]",
" Filter: #part.p_size >= Int32(1) [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
" TableScan: part projection=[p_partkey, p_brand, p_size], partial_filters=[#part.p_size >= Int32(1)] [p_partkey:Int64, p_brand:Utf8, p_size:Int32]",
" Filter: #lineitem.l_quantity >= Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= Decimal128(Some(1100),15,2) OR #lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(2000),15,2) OR #lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= Decimal128(Some(3000),15,2) [l_partkey:Int64, l_quantity:Decimal128(15, 2)]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this plan and I agree it seems correct (as in the pushed down filters don't filter out anything that would have passed the original filter)

@@ -1484,7 +1484,8 @@ async fn reduce_left_join_2() -> Result<()> {
" Filter: CAST(#t2.t2_int AS Int64) < Int64(10) OR CAST(#t1.t1_int AS Int64) > Int64(2) AND #t2.t2_name != Utf8(\"w\") [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" Inner Join: #t1.t1_id = #t2.t2_id [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" TableScan: t1 projection=[t1_id, t1_name, t1_int] [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N]",
" TableScan: t2 projection=[t2_id, t2_name, t2_int] [t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" Filter: CAST(#t2.t2_int AS Int64) < Int64(10) OR #t2.t2_name != Utf8(\"w\") [t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice

Comment on lines +254 to +255
// to new OR clause as predicate.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to explain the conditions under which a qual can be extracted as it may not be obvious to someone when they initially look at this.

Suggested change
// to new OR clause as predicate.
//
// to new OR clause as predicate.
//
// A qual is extracted if it it contains (only) common set of column references with the other quals.

I am not sure that is correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to see the return type documented here (as in what does the (Vec<Expr>, Vec<HashSet<Column>>) represent? I think it is the extracted quals and their column references but I am not sure

(exprs, expr_columns)
}

// extract qual from OR sub-clause.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some additional comments under what conditions the OR clause is extracted? I tried to explain above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can add more comments for this.

@alamb
Copy link
Contributor

alamb commented Oct 12, 2022

I would be willing to help write some more tests / comments for this PR if others think it is a good idea.

@Dandandan
Copy link
Contributor

@AssHero do you have some numbers, e.g. how much does it help q19?

@alamb one thing we might consider is to apply the optimization whenever there is an expensive parent node (joins/aggregate) that benefits from more filtering? Anyway, in most cases, filtering should be cheap compared to other operations as long as the expression is not very expensive (for example, no expensive UDF).

@alamb alamb dismissed their stale review October 12, 2022 20:56

Transformation is OK after I thought about it more

@HuSen8891
Copy link
Contributor Author

@AssHero do you have some numbers, e.g. how much does it help q19?

@alamb one thing we might consider is to apply the optimization whenever there is an expensive parent node (joins/aggregate) that benefits from more filtering? Anyway, in most cases, filtering should be cheap compared to other operations as long as the expression is not very expensive (for example, no expensive UDF).

Before this commit 'Convert more cross joins to inner joins'(https://github.com/apache/arrow-datafusion/pull/3482), this optimization helps much in q19, and I intruduce this optimization to solve q19's performance problem.
Now q19 is transformed to inner join, and this helps little in q19, but I think this will help other queries with OR clauses.

@HuSen8891
Copy link
Contributor Author

I think this optimization might actually slow some plans down (if the filters that are pushed down don't actually filter many rows, they will just consume CPU). I don't have a great suggestion about this

I have made some tests on q19 with this optimization on current version, it does not slow down the queries even through it helps little(may consume more CPU).
I think more better way is combined with statistic data(CBO). With the statistic data, we can evaluate the rows filtered by this predicate, then make the decision.

@HuSen8891
Copy link
Contributor Author

add some comments about extract_or_clause.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AssHero

I think this is good enough and we can adjust / fix this this transformation if we find in the future the predicates are hurting more than helping

Another nice thing about pushing single column predicates down is that they can be applied during the scan as well (e.g. #3463 )

@alamb alamb changed the title extract OR clause for join Pushdown single column predicates from OR clause below joins Oct 15, 2022
@alamb alamb changed the title Pushdown single column predicates from OR clause below joins Pushdown single column predicates from ON join clauses Oct 15, 2022
@alamb
Copy link
Contributor

alamb commented Oct 15, 2022

I took the liberty of merging this branch from master and resolving the merge conflict in fe582a7

@alamb
Copy link
Contributor

alamb commented Oct 15, 2022

Thanks again for sticking with this one @AssHero

@alamb
Copy link
Contributor

alamb commented Oct 15, 2022

The newly added TPCH plan benchmarks needed to be updated as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @avantgardnerio and @andygrove who I know run the tpch benchmarks regularly

I think we should probably run https://github.com/apache/arrow-datafusion/tree/master/benchmarks#benchmark-derived-from-tpc-h before merging this in

(Kudos to @andygrove for adding the tpch plan benchmarks -- they are super helpful)

@@ -14,7 +14,9 @@ Sort: shipping.supp_nation ASC NULLS LAST, shipping.cust_nation ASC NULLS LAST,
TableScan: lineitem projection=[l_orderkey, l_suppkey, l_extendedprice, l_discount, l_shipdate]
TableScan: orders projection=[o_orderkey, o_custkey]
TableScan: customer projection=[c_custkey, c_nationkey]
SubqueryAlias: n1
Filter: n1.n_name = Utf8("FRANCE") OR n1.n_name = Utf8("GERMANY")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 these should be fine, though I expect the performance to be negligible as the number of rows in nation are very small

Copy link
Contributor

@Dandandan Dandandan Oct 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think so too - although if it stays on the probe side (e.g. the HashBuildProbeOrder rule doesn't apply) - the joins might still quite a bit slower, and the input and output of the joins will be bigger as well making them more expensive.
FYI @andygrove this might change the perf in Ballista too.

@@ -3,7 +3,7 @@ Projection: SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount) AS re
Projection: lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AS lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")Utf8("DELIVER IN PERSON")lineitem.l_shipinstruct, lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AS lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")])Utf8("AIR REG")Utf8("AIR")lineitem.l_shipmode, part.p_size >= Int32(1) AS part.p_size >= Int32(1)Int32(1)part.p_size, lineitem.l_quantity, lineitem.l_extendedprice, lineitem.l_discount, part.p_brand, part.p_size, part.p_container
Filter: part.p_brand = Utf8("Brand#12") AND part.p_container IN ([Utf8("SM CASE"), Utf8("SM BOX"), Utf8("SM PACK"), Utf8("SM PKG")]) AND lineitem.l_quantity >= Decimal128(Some(100),15,2) AND lineitem.l_quantity <= Decimal128(Some(1100),15,2) AND part.p_size <= Int32(5) OR part.p_brand = Utf8("Brand#23") AND part.p_container IN ([Utf8("MED BAG"), Utf8("MED BOX"), Utf8("MED PKG"), Utf8("MED PACK")]) AND lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND lineitem.l_quantity <= Decimal128(Some(2000),15,2) AND part.p_size <= Int32(10) OR part.p_brand = Utf8("Brand#34") AND part.p_container IN ([Utf8("LG CASE"), Utf8("LG BOX"), Utf8("LG PACK"), Utf8("LG PKG")]) AND lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND lineitem.l_quantity <= Decimal128(Some(3000),15,2) AND part.p_size <= Int32(15)
Inner Join: lineitem.l_partkey = part.p_partkey
Filter: lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AND lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")
Filter: lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]) AND lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AND lineitem.l_quantity >= Decimal128(Some(100),15,2) AND lineitem.l_quantity <= Decimal128(Some(1100),15,2) OR lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND lineitem.l_quantity <= Decimal128(Some(2000),15,2) OR lineitem.l_quantity >= Decimal128(Some(2000),15,2) AND lineitem.l_quantity <= Decimal128(Some(3000),15,2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are probably good filters to have added as long as they are selective, though they may be expensive to evaluate because they will be evaluated against all rows in lineitem rather than only those rows which were not filtered out by the join with part.

To make this super fast, maybe could use "sideways information passing" to push down the part filtering into the lineitem scan as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sometimes surprised when profiling how little CPU the filters consume - it is really cheap compared to parquet reads / joins / etc. so almost any filtering we can do probably helps as long as it filters out some data and a subsequent operators can benefit from it.

@Dandandan
Copy link
Contributor

@alamb running the benchmarks now for q7 and q19 posting them here when done

@Dandandan
Copy link
Contributor

Dandandan commented Oct 15, 2022

Looks like performance regressed a bit :/ we probably have a better look before merging it @alamb

Looks like roughly a 1.35x speedup on q7 and a smaller 1.16x speedup on q19 🚀

Running with (on partitioned parquet data - 16 partitions, 16 vCPU):

cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path [path] --format parquet --query [query] --batch-size 8192 -n 16

Master:

Query 7 iteration 0 took 6488.3 ms and returned 4 rows
Query 7 iteration 1 took 6235.8 ms and returned 4 rows
Query 7 iteration 2 took 6293.0 ms and returned 4 rows
Query 7 avg time: 6339.04 ms

Query 19 iteration 0 took 1180.6 ms and returned 1 rows
Query 19 iteration 1 took 1012.3 ms and returned 1 rows
Query 19 iteration 2 took 1017.0 ms and returned 1 rows
Query 19 avg time: 1069.99 ms

This branch:

Query 7 iteration 0 took 4856.0 ms and returned 4 rows
Query 7 iteration 1 took 4645.7 ms and returned 4 rows
Query 7 iteration 2 took 4661.6 ms and returned 4 rows
Query 7 avg time: 4721.10 ms

Query 19 iteration 0 took 1021.7 ms and returned 1 rows
Query 19 iteration 1 took 866.5 ms and returned 1 rows
Query 19 iteration 2 took 872.9 ms and returned 1 rows
Query 19 avg time: 920.37 ms

@Dandandan
Copy link
Contributor

Hm getting different results now - may have been something running in the background on my machine - will update the results accordingly

@Dandandan
Copy link
Contributor

Ok - updated the benchmark results - looks good to go now!

@Dandandan Dandandan merged commit e02376d into apache:master Oct 15, 2022
@Dandandan
Copy link
Contributor

Thanks @AssHero @alamb

@ursabot
Copy link

ursabot commented Oct 15, 2022

Benchmark runs are scheduled for baseline = fc5081d and contender = e02376d. e02376d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Oct 16, 2022

Nice! Thanks everyone

@HuSen8891 HuSen8891 deleted the extract_or_clause branch October 17, 2022 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extract or clause as predicate for join rels
5 participants