Partial Sort Plan Implementation #9125

ahmetenis · 2024-02-04T18:41:44Z

Which issue does this PR close?

Closes #7456.

Rationale for this change

In the default configuration, the naive planner frequently employs the SortExec mechanism to ensure data is correctly ordered. This SortExec utilization poses a challenge, particularly in streaming environments. The inherent nature of SortExec demands access to the entire dataset for sorting, a requirement in conflict with the streaming model, which processes data in chunks rather than as a whole.

However, if a table has an order like [a ASC, b ASC] and a lexicographical sort of [a ASC, b ASC, d ASC] is needed, the d column might be sorted without accessing the whole dataset in many practical cases.

The primary objective is to optimize the naive planner to eliminate the necessity of materializing the entire dataset with SortExec, especially in streaming scenarios.

What changes are included in this PR?

Added a subrule in EnforceSorting to replace SortExec with PartialSortExec if the input is unbounded and a prefix of input ordering satisfies the ordering required by the sort plan.
Implemented PartialSortExec which evaluates the segments from the input data where the segments already have the information necessary for sorting and emits the sorted segments.

Are these changes tested?

Unit tests added for enforce_sorting subrule replacing SortExec plan with PartialSortExec plan when the input is unbounded and prefix of input ordering satisfies the required ordering
Unit tests added for PartialSortExec

Are there any user-facing changes?

datafusion/physical-plan/src/sorts/partial_sort.rs

metesynnada

We can merge this after the review is resolved. LGTM!

datafusion/physical-plan/src/sorts/partial_sort.rs

alamb

This looks like a neat PR -- thank you very much @ahmetenis -- I plan to review it carefully over the next day or two

alamb

Thank you @ahmetenis -- this is a really nice first PR. Well done. Thank you also to @mustafasrepo and @metesynnada for the reviews. The code looks very nice -- both well documented and well tested.

I think we should file follow on tickets for enabling this operator in more queries and further optimizations, but I don't think that needs to be done prior to merging this PR

alamb · 2024-02-06T12:32:55Z

datafusion/core/src/physical_optimizer/enforce_sorting.rs

+    let plan_any = plan.as_any();
+    if let Some(sort_plan) = plan_any.downcast_ref::<SortExec>() {
+        let child = sort_plan.children()[0].clone();
+        if !unbounded_output(&child) {


Can you add a comment here about why this operator is only used with unbounded output?

I think it is more generally applicable than for just unbounded cases (it would make any plan more streaming as well as require less memory)

We don't have to do it as part of this PR, but I think we should file a follow on ticket to use this operation more generally

I am not completely sure whether it will be safe to expand partial sort without incorporating the ExternalSorter used in SortExec. Would love to hear your thoughts on this.

Also SortExec with unbounded input is already pipeline breaking, I wanted to first gate this change behind unbounded input and improve functionality without regressing the current behaviour.

Opened an issue for expanding PartialSort to other use cases here: #9153

alamb · 2024-02-06T12:35:33Z

datafusion/physical-plan/src/sorts/partial_sort.rs

+        let result = match ready!(self.input.poll_next_unpin(cx)) {
+            Some(Ok(batch)) => {
+                self.input_batch =
+                    concat_batches(&self.schema(), [&self.input_batch, &batch])?;


This will result in copying the first input batch N times (if it takes N input batches to produce the output)

You could potentially also buffer the batches in something like Vec<RecordBatch> and do the concat during Self::emit

thanks for the suggestion updated the implementation to keep batches in a vec to concat once.

datafusion/sqllogictest/test_files/group_by.slt

datafusion/physical-plan/src/sorts/partial_sort.rs

ozankabak · 2024-02-06T14:11:43Z

Great progress! I will take a look tomorrow as well

ahmetenis · 2024-02-06T18:22:44Z

We can merge this after the review is resolved. LGTM!

Thanks @metesynnada for the review.

ahmetenis · 2024-02-06T18:23:38Z

Thank you @ahmetenis -- this is a really nice first PR. Well done. Thank you also to @mustafasrepo and @metesynnada for the reviews. The code looks very nice -- both well documented and well tested.

I think we should file follow on tickets for enabling this operator in more queries and further optimizations, but I don't think that needs to be done prior to merging this PR

Thanks @alamb for the review, will try to address the comments by EOD tomorrow. I have to give credit to @mustafasrepo for his great guidance.

ozankabak

LGTM. Let's open an issue to add spilling support (it doesn't seem to support that now), but other than I see no issues.

Thanks for the good work, @ahmetenis!

alamb · 2024-02-09T01:51:44Z

LGTM. Let's open an issue to add spilling support (it doesn't seem to support that now), but other than I see no issues.

Thanks for the good work, @ahmetenis!

Filed #9170 to track

alamb · 2024-02-09T01:51:52Z

Thanks again everyone!

Ahmet Enis Erdogan added 4 commits February 4, 2024 21:33

partial sort plan implementation

c5c3de9

implement limiting the stream and remove topk

c3e2679

update documentation

a8670a5

rename replace_with_partial_sort tests

febcabe

github-actions bot added the core Core DataFusion crate label Feb 4, 2024

ahmetenis marked this pull request as draft February 4, 2024 18:42