Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapse sort into window expr and do sort within logical phase #571

Merged
merged 1 commit into from
Jun 24, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Jun 16, 2021

Which issue does this PR close?

Based on #564 so review that first

Closes #573
Closes #526

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@codecov-commenter
Copy link

Codecov Report

Merging #571 (dd93f0e) into master (51e5445) will increase coverage by 0.07%.
The diff coverage is 89.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
+ Coverage   76.02%   76.10%   +0.07%     
==========================================
  Files         156      156              
  Lines       27063    27145      +82     
==========================================
+ Hits        20575    20658      +83     
+ Misses       6488     6487       -1     
Impacted Files Coverage Δ
datafusion/src/physical_plan/mod.rs 80.00% <78.57%> (+3.27%) ⬆️
datafusion/src/physical_plan/windows.rs 82.59% <79.41%> (+0.01%) ⬆️
datafusion/src/physical_plan/planner.rs 80.20% <92.59%> (+0.85%) ⬆️
datafusion/src/execution/context.rs 92.13% <100.00%> (+0.09%) ⬆️
datafusion/src/logical_plan/expr.rs 84.58% <100.00%> (+0.30%) ⬆️
...afusion/src/physical_plan/expressions/nth_value.rs 79.41% <100.00%> (+0.62%) ⬆️
datafusion/src/sql/planner.rs 84.72% <100.00%> (-0.04%) ⬇️
datafusion/src/sql/utils.rs 68.85% <100.00%> (ø)
datafusion/tests/sql.rs 99.29% <100.00%> (+<0.01%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51e5445...dd93f0e. Read the comment docs.

@jimexist jimexist force-pushed the window-aggr-collapse-sort branch 3 times, most recently from 906b62d to f4341d4 Compare June 21, 2021 12:49
@jimexist jimexist changed the title WIP collapse sort into window expr and do sort within logical phase Collapse sort into window expr and do sort within logical phase Jun 21, 2021
@jimexist jimexist marked this pull request as ready for review June 21, 2021 12:50
@jimexist
Copy link
Member Author

jimexist commented Jun 21, 2021

performance comparison:

Benchmarking window empty over, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50.
window empty over, aggregate functions
                        time:   [1.8359 ms 1.8475 ms 1.8610 ms]
                        change: [-0.9132% +0.5180% +2.1212%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

Benchmarking window empty over, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.0s, enable flat sampling, or reduce sample count to 50.
window empty over, built-in functions
                        time:   [1.3685 ms 1.3760 ms 1.3844 ms]
                        change: [-4.4221% -2.8960% -1.5378%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

window order by, aggregate functions
                        time:   [8.1463 ms 8.2384 ms 8.3348 ms]
                        change: [-4.8125% -3.2407% -1.6147%] (p = 0.00 < 0.05)
                        Performance has improved.

window order by, built-in functions
                        time:   [7.4629 ms 7.5195 ms 7.5799 ms]
                        change: [-0.3291% +0.5924% +1.5385%] (p = 0.23 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

window partition by, u64_wide, aggregate functions
                        time:   [18.755 ms 18.945 ms 19.140 ms]
                        change: [+4.0084% +5.3579% +6.5762%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

window partition by, u64_narrow, aggregate functions
                        time:   [6.3792 ms 6.4266 ms 6.4811 ms]
                        change: [-2.6467% -1.6117% -0.4429%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

window partition by, u64_wide, built-in functions
                        time:   [16.257 ms 16.400 ms 16.551 ms]
                        change: [-0.1608% +0.9364% +2.1068%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

window partition by, u64_narrow, built-in functions
                        time:   [5.9251 ms 5.9624 ms 6.0024 ms]
                        change: [+0.8259% +1.7545% +2.7146%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

window partition and order by, u64_wide, aggregate functions
                        time:   [24.977 ms 25.136 ms 25.308 ms]
                        change: [-2.8450% -1.7267% -0.6014%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window partition and order by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.4s, or reduce sample count to 90.
window partition and order by, u64_narrow, aggregate functions
                        time:   [54.278 ms 54.907 ms 55.598 ms]
                        change: [+0.8203% +2.1981% +3.7199%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

window partition and order by, u64_wide, built-in functions
                        time:   [23.541 ms 23.740 ms 23.949 ms]
                        change: [-1.1173% +0.2109% +1.4568%] (p = 0.74 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

window partition and order by, u64_narrow, built-in functions
                        time:   [15.821 ms 15.958 ms 16.103 ms]
                        change: [+0.3405% +1.5444% +2.6931%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

consider that the test data are generated per run, i would ignore 5%-ish variations.

@jimexist
Copy link
Member Author

with 1million records and 8k batch size

window empty over, aggregate functions
                        time:   [31.516 ms 31.699 ms 31.875 ms]
                        change: [-2.6548% -1.7125% -0.8447%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

window empty over, built-in functions
                        time:   [24.184 ms 24.441 ms 24.736 ms]
                        change: [-5.3716% -3.9746% -2.5772%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking window order by, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.9s, or reduce sample count to 20.
window order by, aggregate functions
                        time:   [167.08 ms 168.43 ms 169.84 ms]
                        change: [-1.6618% -0.5790% +0.4650%] (p = 0.31 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking window order by, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 15.5s, or reduce sample count to 30.
window order by, built-in functions
                        time:   [152.19 ms 153.62 ms 155.08 ms]
                        change: [-2.6093% -1.1311% +0.2267%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking window partition by, u64_wide, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 24.8s, or reduce sample count to 20.
window partition by, u64_wide, aggregate functions
                        time:   [239.96 ms 242.61 ms 245.37 ms]
                        change: [-5.2335% -3.7045% -2.2581%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window partition by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.1s, or reduce sample count to 30.
window partition by, u64_narrow, aggregate functions
                        time:   [125.74 ms 127.00 ms 128.34 ms]
                        change: [-9.6106% -8.1814% -6.6467%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking window partition by, u64_wide, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 21.5s, or reduce sample count to 20.
window partition by, u64_wide, built-in functions
                        time:   [207.51 ms 208.79 ms 210.09 ms]
                        change: [-3.9621% -2.8513% -1.7668%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking window partition by, u64_narrow, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.2s, or reduce sample count to 40.
window partition by, u64_narrow, built-in functions
                        time:   [111.40 ms 112.44 ms 113.55 ms]
                        change: [-4.1476% -2.8162% -1.5589%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window partition and order by, u64_wide, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 43.5s, or reduce sample count to 10.
window partition and order by, u64_wide, aggregate functions
                        time:   [420.60 ms 423.97 ms 427.48 ms]
                        change: [-2.0448% -0.9388% +0.1643%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking window partition and order by, u64_narrow, aggregate functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 62.5s, or reduce sample count to 10.
window partition and order by, u64_narrow, aggregate functions
                        time:   [617.40 ms 623.61 ms 630.15 ms]
                        change: [+1.3574% +2.6250% +3.9925%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking window partition and order by, u64_wide, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 38.7s, or reduce sample count to 10.
window partition and order by, u64_wide, built-in functions
                        time:   [382.85 ms 386.08 ms 389.30 ms]
                        change: [-0.7637% +0.3990% +1.5670%] (p = 0.51 > 0.05)
                        No change in performance detected.

Benchmarking window partition and order by, u64_narrow, built-in functions: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 37.8s, or reduce sample count to 10.
window partition and order by, u64_narrow, built-in functions
                        time:   [374.39 ms 377.18 ms 380.05 ms]
                        change: [-3.3808% -2.2769% -1.1041%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

@jimexist jimexist force-pushed the window-aggr-collapse-sort branch 2 times, most recently from eef960a to 2f83560 Compare June 22, 2021 10:12
@jimexist
Copy link
Member Author

this will make way for #569

@alamb
Copy link
Contributor

alamb commented Jun 22, 2021

I am sorry I am behind on reviews in DataFusion -- I plan to work through the backlog tomorrow

@jimexist jimexist force-pushed the window-aggr-collapse-sort branch 2 times, most recently from 7845ac4 to e48fdd3 Compare June 23, 2021 00:47
@jimexist jimexist force-pushed the window-aggr-collapse-sort branch 2 times, most recently from ce6a559 to 7553b89 Compare June 23, 2021 07:42
@alamb alamb added the datafusion Changes in the datafusion crate label Jun 23, 2021

// at this moment we are guaranteed by the logical planner
// to have all the window_expr to have equal sort key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the design that the planner will group all WindowExpr that have the same window (ORDER BY, PARTITION BY) into the same LogicalPlan:Window node? That sounds like a good plan to me 👍

It might (eventually) be worth validating that invariant here and asserting (or returning an Error` if the sort keys are not the same for multiple window exprs (mostly to catch potential bugs faster in the future)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can add the assertion later. yes the invariant is real.

\n WindowAggr: windowExpr=[[MIN(#orders.qty)]]\
\n Sort: #orders.order_id DESC NULLS FIRST\
\n TableScan: orders projection=None";
\n WindowAggr: windowExpr=[[MAX(#orders.qty) ORDER BY [#orders.order_id ASC NULLS FIRST]]]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is cool to see two different WindowAggr nodes with different ORDER BY clauses. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust uses timsort, and it shall be quick (to reverse an already sorted vec).

ref: https://en.wikipedia.org/wiki/Timsort#Descending_runs

@@ -1452,11 +1452,18 @@ impl fmt::Debug for Expr {
}
Expr::WindowFunction {
fun,
ref args,
args,
partition_by,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the partition_by and order_by more logically belong on the LogicalPlan::Window node if they can be shared across Expr::WindowFunction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a separate construct not yet available in the sql parser.

see https://www.postgresql.org/docs/current/tutorial-window.html

When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:

SELECT sum(salary) OVER w, avg(salary) OVER w
  FROM empsalary
  WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

Copy link
Member Author

@jimexist jimexist Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the partition_by and order_by more logically belong on the LogicalPlan::Window node if they can be shared across Expr::WindowFunction

for logically planned window function node it's not necessarily the case, because sort order is defined by the concatenations of partition_by and then order_by, but consider:

select max(c1) over (partition by c2 order by c3), max(c1) over (order by c2, c3) from test;

both window functions will have the same sort key of [c2 asc nulls first, c3 asc nulls first] but they mean different things, and they might be logically planned together (meaning same pre-sorting happens) but not the same evaluation will happen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- this makes sense. Thank you for the clarification

Perhaps it is worth a comment (in a follow on PR) on the LogicalPlan window to this effect

@alamb
Copy link
Contributor

alamb commented Jun 23, 2021

Looks like a good step forward to me

@jimexist jimexist force-pushed the window-aggr-collapse-sort branch from 7553b89 to 5b44dce Compare June 23, 2021 15:34
@@ -1452,11 +1452,18 @@ impl fmt::Debug for Expr {
}
Expr::WindowFunction {
fun,
ref args,
args,
partition_by,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- this makes sense. Thank you for the clarification

Perhaps it is worth a comment (in a follow on PR) on the LogicalPlan window to this effect

@alamb
Copy link
Contributor

alamb commented Jun 23, 2021

@jimexist there appears to be a build failure now:


   Compiling lz4 v1.23.2
   Compiling zstd v0.8.3+zstd.1.5.0
   Compiling datafusion v4.0.0-SNAPSHOT (/__w/arrow-datafusion/arrow-datafusion/datafusion)
error[E0282]: type annotations needed
   --> datafusion/src/sql/planner.rs:699:17
    |
699 |             let window_exprs = exprs.into_iter().cloned().collect();
    |                 ^^^^^^^^^^^^ consider giving `window_exprs` a type

error: aborting due to previous error

unless there are any other comments I'll plan to merge this tomorrow when the build issue gets fixed

@jimexist jimexist force-pushed the window-aggr-collapse-sort branch from 5b44dce to 20c19ad Compare June 24, 2021 00:05

let sort_keys = get_sort_keys(&window_expr[0]);
if window_expr.len() > 1 {
debug_assert!(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb added this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


let sort_keys = get_sort_keys(&window_expr[0]);
if window_expr.len() > 1 {
debug_assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Jun 24, 2021

Thanks again @jimexist

@alamb alamb merged commit 0d10dce into apache:master Jun 24, 2021
@jimexist jimexist deleted the window-aggr-collapse-sort branch June 24, 2021 14:25
@houqp houqp added the performance Make DataFusion faster label Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate performance Make DataFusion faster
Projects
None yet
4 participants