-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Optimize where exists sub-queries into aggregate and join
#2813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
14d807a
Failing tests
avantgardnerio 88f5d7f
Add month/year arithmetic
avantgardnerio d2f43c9
Fix tests?
avantgardnerio e34705e
Fix clippy?
avantgardnerio c37d29e
Update datafusion/common/src/scalar.rs
avantgardnerio 874a5ed
Add support for all types, fix math
avantgardnerio ee1c756
Fix doc
avantgardnerio 5ea1c28
Fix test that relied on previous flawed implementation
avantgardnerio 8348470
Appease clippy
avantgardnerio cd999c7
Failing test case for TPC-H query 20
avantgardnerio ccdb98f
Fix name
avantgardnerio e7fcb2f
Broken test for adding intervals to dates
avantgardnerio 9b51e46
Tests pass
avantgardnerio de8ae11
Fix rebase
avantgardnerio 8dd2b16
Fix query
avantgardnerio 34b2908
Additional tests
avantgardnerio 6a759ce
Reduce to minimum failing (and passing) cases
avantgardnerio 37a73c2
Adjust so data _should_ be returned, but see none
avantgardnerio 1db5c8d
Fixed data, decorrelated test passes
avantgardnerio f3ee70c
Check in plans
avantgardnerio b08da97
Put real assertion in place
avantgardnerio f22c079
Add test for already working subquery optimizer
avantgardnerio 0e0e0c7
Add decorellator
avantgardnerio 308b67c
Check in broken test
avantgardnerio 0c5ed1a
Add some passing and failing tests to see scope of problem
avantgardnerio d11d7f9
Have almost all inputs needed for optimization, but need to catch 1 l…
avantgardnerio 6ab6894
Collected all inputs, now we just need to optimize
avantgardnerio b281c8c
Successfully decorrelated query 4
avantgardnerio 6a08eb1
refactor
avantgardnerio 7e02545
Pass test 4
avantgardnerio ea3f219
Ready for PR?
avantgardnerio 50b3549
Only operate on equality expressions
avantgardnerio f90d95a
Lint error
avantgardnerio 9377cdf
Tests still pass because we are losing remaining predicate
avantgardnerio 23b0ffb
Don't lose remaining expressions
avantgardnerio 858b284
Update test to expect remaining filter clause
avantgardnerio 00a661b
Debugging
avantgardnerio 1708415
Can run query 4
avantgardnerio 60a6e58
Remove debugging code
avantgardnerio b8c0808
Clippy
avantgardnerio File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -108,6 +108,7 @@ mod explain; | |
| mod idenfifers; | ||
| pub mod information_schema; | ||
| mod partitioned_csv; | ||
| mod subqueries; | ||
| #[cfg(feature = "unicode_expressions")] | ||
| pub mod unicode; | ||
|
|
||
|
|
@@ -483,7 +484,37 @@ fn get_tpch_table_schema(table: &str) -> Schema { | |
| Field::new("n_comment", DataType::Utf8, false), | ||
| ]), | ||
|
|
||
| _ => unimplemented!(), | ||
| "supplier" => Schema::new(vec![ | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add missing TPC-H tables to support testing those queries. |
||
| Field::new("s_suppkey", DataType::Int64, false), | ||
| Field::new("s_name", DataType::Utf8, false), | ||
| Field::new("s_address", DataType::Utf8, false), | ||
| Field::new("s_nationkey", DataType::Int64, false), | ||
| Field::new("s_phone", DataType::Utf8, false), | ||
| Field::new("s_acctbal", DataType::Float64, false), | ||
| Field::new("s_comment", DataType::Utf8, false), | ||
| ]), | ||
|
|
||
| "partsupp" => Schema::new(vec![ | ||
| Field::new("ps_partkey", DataType::Int64, false), | ||
| Field::new("ps_suppkey", DataType::Int64, false), | ||
| Field::new("ps_availqty", DataType::Int32, false), | ||
| Field::new("ps_supplycost", DataType::Float64, false), | ||
| Field::new("ps_comment", DataType::Utf8, false), | ||
| ]), | ||
|
|
||
| "part" => Schema::new(vec![ | ||
| Field::new("p_partkey", DataType::Int64, false), | ||
| Field::new("p_name", DataType::Utf8, false), | ||
| Field::new("p_mfgr", DataType::Utf8, false), | ||
| Field::new("p_brand", DataType::Utf8, false), | ||
| Field::new("p_type", DataType::Utf8, false), | ||
| Field::new("p_size", DataType::Int32, false), | ||
| Field::new("p_container", DataType::Utf8, false), | ||
| Field::new("p_retailprice", DataType::Float64, false), | ||
| Field::new("p_comment", DataType::Utf8, false), | ||
| ]), | ||
|
|
||
| _ => unimplemented!("Table: {}", table), | ||
| } | ||
| } | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| use super::*; | ||
| use crate::sql::execute_to_batches; | ||
| use datafusion::assert_batches_eq; | ||
| use datafusion::prelude::SessionContext; | ||
|
|
||
| #[tokio::test] | ||
| async fn tpch_q4_correlated() -> Result<()> { | ||
| let ctx = SessionContext::new(); | ||
| register_tpch_csv(&ctx, "orders").await?; | ||
| register_tpch_csv(&ctx, "lineitem").await?; | ||
|
|
||
| /* | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Annotate plan with variable names from optimizer code for cross-correlation. |
||
| #orders.o_orderpriority ASC NULLS LAST | ||
| Projection: #orders.o_orderpriority, #COUNT(UInt8(1)) AS order_count | ||
| Aggregate: groupBy=[[#orders.o_orderpriority]], aggr=[[COUNT(UInt8(1))]] | ||
| Filter: EXISTS ( -- plan | ||
| Subquery: Projection: * -- proj | ||
| Filter: #lineitem.l_orderkey = #orders.o_orderkey -- filter | ||
| TableScan: lineitem projection=None -- filter.input | ||
| ) | ||
| TableScan: orders projection=None -- plan.inputs | ||
| */ | ||
| let sql = r#" | ||
| select o_orderpriority, count(*) as order_count | ||
| from orders | ||
| where exists ( | ||
| select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate) | ||
| group by o_orderpriority | ||
| order by o_orderpriority; | ||
| "#; | ||
|
|
||
| // assert plan | ||
| let plan = ctx | ||
| .create_logical_plan(sql) | ||
| .map_err(|e| format!("{:?} at {}", e, "error")) | ||
| .unwrap(); | ||
| let plan = ctx | ||
| .optimize(&plan) | ||
| .map_err(|e| format!("{:?} at {}", e, "error")) | ||
| .unwrap(); | ||
| let actual = format!("{}", plan.display_indent()); | ||
| let expected = r#"Sort: #orders.o_orderpriority ASC NULLS LAST | ||
| Projection: #orders.o_orderpriority, #COUNT(UInt8(1)) AS order_count | ||
| Aggregate: groupBy=[[#orders.o_orderpriority]], aggr=[[COUNT(UInt8(1))]] | ||
| Inner Join: #orders.o_orderkey = #lineitem.l_orderkey | ||
| TableScan: orders projection=[o_orderkey, o_orderpriority] | ||
| Projection: #lineitem.l_orderkey | ||
| Aggregate: groupBy=[[#lineitem.l_orderkey]], aggr=[[]] | ||
| Filter: #lineitem.l_commitdate < #lineitem.l_receiptdate | ||
| TableScan: lineitem projection=[l_orderkey, l_commitdate, l_receiptdate], partial_filters=[#lineitem.l_commitdate < #lineitem.l_receiptdate]"# | ||
| .to_string(); | ||
| assert_eq!(actual, expected); | ||
|
|
||
| // assert data | ||
| let results = execute_to_batches(&ctx, sql).await; | ||
| let expected = vec![ | ||
| "+-----------------+-------------+", | ||
| "| o_orderpriority | order_count |", | ||
| "+-----------------+-------------+", | ||
| "| 1-URGENT | 1 |", | ||
| "| 5-LOW | 1 |", | ||
| "+-----------------+-------------+", | ||
| ]; | ||
| assert_batches_eq!(expected, &results); | ||
|
|
||
| Ok(()) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| p_partkey,p_name,p_mfgr,p_brand,p_type,p_size,p_container,p_retailprice,p_comment | ||
| 1,goldenrod lavender spring chocolate lace,Manufacturer#1,Brand#13,PROMO BURNISHED COPPER,7,JUMBO PKG,901.00,ly. slyly ironi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ps_partkey,ps_suppkey,ps_availqty,ps_supplycost,ps_comment | ||
| 67310,7311,100,993.49,ven ideas. quickly even packages print. pending multipliers must have to are fluff |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| s_suppkey,s_name,s_address,s_nationkey,s_phone,s_acctbal,s_comment | ||
| 1,Supplier#000000001, N kD4on9OM Ipw3,gf0JBoQDd7tgrzrddZ,17,27-918-335-1736,5755.94,each slyly above the careful |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was built upon #2797 . I'll turn this into a draft until that gets merged.