Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fetch limit in get_sorted_iter #3545

Merged
merged 9 commits into from
Sep 21, 2022
Merged

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Sep 20, 2022

Which issue does this PR close?

Closes #3544

Rationale for this change

Provides a small speedup vs the earlier results.

We can see from the output from explain analyze select l_orderkey from t order by l_orderkey limit 10;

Before:

| SortPreservingMergeExec: [l_orderkey@0 ASC NULLS LAST], metrics=[output_rows=8192, elapsed_compute=4.21486ms, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
|   SortExec: [l_orderkey@0 ASC NULLS LAST], metrics=[output_rows=73230, elapsed_compute=267.190575ms, spill_count=0, spilled_bytes=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |

After(note, only 160 rows=10 rows * 16 partitions)

|   SortPreservingMergeExec: [l_orderkey@0 ASC NULLS LAST], metrics=[output_rows=160, elapsed_compute=74.848µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
|    SortExec: [l_orderkey@0 ASC NULLS LAST], metrics=[output_rows=160, elapsed_compute=261.594385ms, spill_count=0, spilled_bytes=0]  

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Sep 20, 2022
@Dandandan Dandandan marked this pull request as ready for review September 20, 2022 15:20
@@ -273,6 +275,7 @@ impl MemoryConsumer for ExternalSorter {
&self.expr,
self.session_config.batch_size(),
tracking_metrics,
self.fetch,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thing is that it also reduces disk spilling, as sort + limit is done before writing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although to be honest, I would hope that if there is a LIMIT on the query we could probably avoid the spilling entirely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe the spilling could see the remaining batch is so small it could add the sorted data to memory again - avoiding the spill 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future PR / optimization perhaps

.map(|i| row_indices[*i as usize])
.collect();

Ok(SortedIterator::new(row_indices, batch_size))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cleanup - we can do this immediately instead of keeping it in SortedIterator

@Dandandan Dandandan requested a review from alamb September 20, 2022 15:46
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to planner - seems a bit more simple. We don't need access to the parent anymore now.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great improvement -- nice work @Dandandan

@@ -273,6 +275,7 @@ impl MemoryConsumer for ExternalSorter {
&self.expr,
self.session_config.batch_size(),
tracking_metrics,
self.fetch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although to be honest, I would hope that if there is a LIMIT on the query we could probably avoid the spilling entirely

@@ -374,44 +379,38 @@ fn get_sorted_iter(
})
})
.collect::<Result<Vec<_>>>()?;
let indices = lexsort_to_indices(&sort_columns, None)?;
let indices = lexsort_to_indices(&sort_columns, fetch)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ 👍

/// Indexes into the input representing the correctly sorted total output
indices: UInt32Array,
/// Map each each logical input index to where it can be found in the sorted input batches
/// Sorted composite index of where to find the rows in buffered batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@Dandandan Dandandan merged commit ff718d0 into apache:master Sep 21, 2022
@ursabot
Copy link

ursabot commented Sep 21, 2022

Benchmark runs are scheduled for baseline = 0a2b0a7 and contender = ff718d0. ff718d0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Sep 21, 2022

This PR appears to have broken the build -- PR to fix: #3576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use fetch limit in get_sorted_iter
3 participants