Push down limit to sort #3530

Dandandan · 2022-09-19T11:18:02Z

Which issue does this PR close?

Closes: #3528

Rationale for this change

select l_orderkey from t order by l_orderkey limit 10;
+------------+
| l_orderkey |
+------------+
| 1          |
| 1          |
| 1          |
| 1          |
| 1          |
| 2          |
| 3          |
| 3          |
| 3          |
| 3          |
+------------+
10 rows in set. Query took 0.172 seconds.

vs after #3527

❯ select l_orderkey from t order by l_orderkey limit 10;
+------------+
| l_orderkey |
+------------+
| 1          |
| 1          |
| 1          |
| 1          |
| 1          |
| 2          |
| 3          |
| 3          |
| 3          |
| 3          |
+------------+
10 rows in set. Query took 0.772 seconds.

What changes are included in this PR?

Are there any user-facing changes?

Dandandan · 2022-09-19T11:20:20Z

datafusion/core/src/physical_plan/sorts/sort.rs

 ) -> ArrowResult<BatchWithSortArray> {
    // TODO: pushup the limit expression to sort
    let sort_columns = expr
        .iter()
        .map(|e| e.evaluate_to_sort_column(&batch))
        .collect::<Result<Vec<SortColumn>>>()?;

-    let indices = lexsort_to_indices(&sort_columns, None)?;
+    let indices = lexsort_to_indices(&sort_columns, fetch)?;


The key optimization: this returns only n indices after the change.

codecov-commenter · 2022-09-19T15:57:28Z

Codecov Report

Merging #3530 (4b1a86a) into master (3a9e0d0) will increase coverage by 0.00%.
The diff coverage is 96.87%.

@@           Coverage Diff           @@
##           master    #3530   +/-   ##
=======================================
  Coverage   85.80%   85.81%           
=======================================
  Files         300      300           
  Lines       55382    55424   +42     
=======================================
+ Hits        47520    47561   +41     
- Misses       7862     7863    +1

Impacted Files	Coverage Δ
datafusion/core/src/dataframe.rs	`89.58% <ø> (ø)`
datafusion/core/tests/user_defined_plan.rs	`87.79% <ø> (ø)`
datafusion/proto/src/logical_plan.rs	`17.43% <0.00%> (-0.04%)`	⬇️
...usion/core/src/physical_optimizer/parallel_sort.rs	`100.00% <100.00%> (ø)`
...afusion/core/src/physical_optimizer/repartition.rs	`100.00% <100.00%> (ø)`
datafusion/core/src/physical_plan/planner.rs	`77.35% <100.00%> (ø)`
datafusion/core/src/physical_plan/sorts/sort.rs	`94.46% <100.00%> (+0.09%)`	⬆️
...e/src/physical_plan/sorts/sort_preserving_merge.rs	`93.84% <100.00%> (+0.03%)`	⬆️
datafusion/core/tests/order_spill_fuzz.rs	`88.88% <100.00%> (ø)`
datafusion/expr/src/logical_plan/builder.rs	`90.20% <100.00%> (+0.03%)`	⬆️
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Dandandan · 2022-09-19T16:16:24Z

datafusion/core/src/physical_optimizer/parallel_sort.rs

-                    .as_any()
+            // SortExec preserve_partitioning=False, fetch=Some(n))
+            // ->  SortPreservingMergeExec (SortExec preserve_partitioning=True, fetch=Some(n))
+            let parallel_sort = plan_any.downcast_ref::<SortExec>().is_some()


As we now have the pushdown - we can use fetch, and support more than just a limit directly after sort.

Support skip, fix test Fmt Add limit directly after sort Update comment Simplify parallel sort by using new pushdown Clippy

alamb

This is a really neat idea @Dandandan -- very beautiful implementation

alamb · 2022-09-20T10:25:06Z

datafusion/core/src/physical_plan/sorts/sort.rs

@@ -657,15 +660,18 @@ pub struct SortExec {
    metrics_set: CompositeMetricsSet,
    /// Preserve partitions of input plan
    preserve_partitioning: bool,
+    /// Fetch highest/lowest n results


I see -- this seems like it it now has the information plumbed to the SortExec to implement "TopK" within the physical operator's implementation. 👍

Very cool

datafusion/core/src/physical_plan/sorts/sort.rs

alamb · 2022-09-20T10:26:01Z

datafusion/core/src/physical_plan/sorts/sort.rs

    let sort_columns = expr
        .iter()
        .map(|e| e.evaluate_to_sort_column(&batch))
        .collect::<Result<Vec<SortColumn>>>()?;

-    let indices = lexsort_to_indices(&sort_columns, None)?;
+    let indices = lexsort_to_indices(&sort_columns, fetch)?;


I wonder if this will effectively get us much of the benefit of a special TopK operator as we don't have to copy the entire input -- we only copy the fetch limit, if specified

Although I suppose SortExec still buffers all of its input where a TopK could buffer them

In fact, I wonder if you could also apply the limit here:

https://github.com/apache/arrow-datafusion/blob/3a9e0d0/datafusion/core/src/physical_plan/sorts/sort.rs#L123-L124

as part of sorting each batch -- rather than keeping the entire input batch, we only need to keep at most fetch rows from each batch.

lexsort_to_indices already returns only fetch indices per batch, this is used to take that nr. of indices per batch, throwing away the rest of the rows.

The remaining optimization I think is tweaking SortPreservingMergeStream to only maintain fetch records in the heap instead of all fetch top records for each batch in the partition as mentioned here #3516 (comment). After this I think we have a full TopK implementation that only needs to keep n number of rows in memory (per partition).

I would like to do this in a separate PR.

A separate PR is a great idea 👍

lexsort_to_indices already returns only fetch indices per batch, this is used to take that nr. of indices per batch, throwing away the rest of the rows.

Right, the point I was trying to make is that there are 2 calls to lexsort_to_indices in sort.rs. I think this PR only pushed fetch to one of them. The second is https://github.com/apache/arrow-datafusion/blob/3a9e0d0/datafusion/core/src/physical_plan/sorts/sort.rs#L826 and I think it is correct to push fetch there too

I was thinking if we applied fetch to the second call, we could get close to the same effect without changing SortPreservingMergeStream.

After this PR, sort buffers num_input_batches * input_batch_size rows.

Adding fetch to the other call to lexsort_to_indices would would buffer num_input_batches * limit rows

Extending SortPreservingMergeStream would allow us to buffer only limit rows.

So clearly extending SortPreservingMergeStream is optimal in terms of rows buffered, but it likely requires a bit more effort.

Ah, I didn't look to much at the rest of the implementation, I think you're right that providing fetch to the other lexsort_to_indices would be beneficial as well. I will create a issue for this and issue a PR later.

I think the current change already buffers num_input_batches * limit by the way, as it is applied before adding them to the buffer. As far as I can see adding the second to lexsort_to_indices will reduce mainly the output of the individual sorts to fetch rows - which is of course beneficial too as that reduces time to sort and limit the input again to take and input to SortPreservingMergeExec

I think you're right that providing fetch to the other lexsort_to_indices would be beneficial as well. I will create a issue for this and issue a PR later.

for other readers, this is addressed by issue #3544 and fixed by PR #3545

datafusion/optimizer/src/limit_push_down.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

ursabot · 2022-09-20T13:22:37Z

Benchmark runs are scheduled for baseline = c7f3a70 and contender = 81b5794. 81b5794 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules labels Sep 19, 2022

Dandandan commented Sep 19, 2022

View reviewed changes

Dandandan force-pushed the push_down_limit_sort branch 4 times, most recently from 269e2b0 to 8c19b9b Compare September 19, 2022 15:26

Dandandan marked this pull request as ready for review September 19, 2022 15:27

Dandandan requested a review from alamb September 19, 2022 15:27

Dandandan force-pushed the push_down_limit_sort branch 2 times, most recently from bf01caf to 4306134 Compare September 19, 2022 15:54

Dandandan force-pushed the push_down_limit_sort branch from 4306134 to 2959049 Compare September 19, 2022 16:10

Dandandan commented Sep 19, 2022

View reviewed changes

Dandandan requested a review from andygrove September 19, 2022 16:16

Push down limit to sort

4b1a86a

Support skip, fix test Fmt Add limit directly after sort Update comment Simplify parallel sort by using new pushdown Clippy

Dandandan force-pushed the push_down_limit_sort branch from 2959049 to 4b1a86a Compare September 19, 2022 16:30

alamb approved these changes Sep 20, 2022

View reviewed changes

Update datafusion/core/src/physical_plan/sorts/sort.rs

1178983

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Dandandan merged commit 81b5794 into apache:master Sep 20, 2022

Dandandan mentioned this pull request Sep 20, 2022

Use fetch limit in get_sorted_iter #3544

Closed

jychen7 mentioned this pull request Apr 14, 2023

Push down limit to SortPreservingMergeExec and SortPreservingMergeStream #6000

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push down limit to sort #3530

Push down limit to sort #3530

Dandandan commented Sep 19, 2022 •

edited

Loading

Dandandan Sep 19, 2022 •

edited

Loading

codecov-commenter commented Sep 19, 2022 •

edited

Loading

Dandandan Sep 19, 2022

alamb left a comment

alamb Sep 20, 2022

alamb Sep 20, 2022

alamb Sep 20, 2022

alamb Sep 20, 2022

Dandandan Sep 20, 2022

alamb Sep 20, 2022

Dandandan Sep 20, 2022 •

edited

Loading

Dandandan Sep 20, 2022 •

edited

Loading

jychen7 Apr 12, 2023 •

edited

Loading

ursabot commented Sep 20, 2022

Push down limit to sort #3530

Push down limit to sort #3530

Conversation

Dandandan commented Sep 19, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Dandandan Sep 19, 2022 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Sep 19, 2022 • edited Loading

Codecov Report

Dandandan Sep 19, 2022

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 20, 2022

Choose a reason for hiding this comment

alamb Sep 20, 2022

Choose a reason for hiding this comment

alamb Sep 20, 2022

Choose a reason for hiding this comment

alamb Sep 20, 2022

Choose a reason for hiding this comment

Dandandan Sep 20, 2022

Choose a reason for hiding this comment

alamb Sep 20, 2022

Choose a reason for hiding this comment

Dandandan Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

Dandandan Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

jychen7 Apr 12, 2023 • edited Loading

Choose a reason for hiding this comment

ursabot commented Sep 20, 2022

Dandandan commented Sep 19, 2022 •

edited

Loading

Dandandan Sep 19, 2022 •

edited

Loading

codecov-commenter commented Sep 19, 2022 •

edited

Loading

Dandandan Sep 20, 2022 •

edited

Loading

Dandandan Sep 20, 2022 •

edited

Loading

jychen7 Apr 12, 2023 •

edited

Loading