Prevent over-allocations (and spills) on sorts with a fixed limit #3593

isidentical · 2022-09-22T22:02:27Z

Which issue does this PR close?

Closes #3596.

Rationale for this change

During sorting, when we receive a new record batch we try to allocate space for it. This is done with the assumption that the result of this sort will still be around, and we don't want to accidentally overflow the memory so we have to keep track of it. But after the #3510, this assumption might not hold for all cases (particularly when you have a fetch limit set on your sorting operation) so we might be over-allocating memory and constantly spilling for no good reason.

What changes are included in this PR?

This PR adds the logic for avoiding over-allocations by instructing the memory manager to shrink after each partial sort with a limit.

Are there any user-facing changes?

No, this should be an optimization.

datafusion/core/src/physical_plan/sorts/sort.rs

alamb · 2022-09-24T10:39:32Z

Thank you @isidentical

Dandandan · 2022-09-24T14:16:52Z

datafusion/core/src/physical_plan/sorts/sort.rs

+
+        for (fetch, expect_spillage) in test_options {
+            let config = RuntimeConfig::new()
+                .with_memory_limit(avg_batch_size * (partitions - 1), 1.0);


Dandandan

LGTM. Thanks @isidentical !

ursabot · 2022-09-24T16:41:49Z

Benchmark runs are scheduled for baseline = 8bcc965 and contender = 696a0b5. 696a0b5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the core Core DataFusion crate label Sep 22, 2022

isidentical force-pushed the gh-3579 branch from 4cd24fc to 12ee54a Compare September 22, 2022 22:11

Dandandan reviewed Sep 23, 2022

View reviewed changes

datafusion/core/src/physical_plan/sorts/sort.rs Outdated Show resolved Hide resolved

isidentical marked this pull request as ready for review September 23, 2022 07:52

isidentical force-pushed the gh-3579 branch from 12ee54a to 2881471 Compare September 23, 2022 12:23

Dandandan reviewed Sep 23, 2022

View reviewed changes

datafusion/core/src/physical_plan/sorts/sort.rs Show resolved Hide resolved

isidentical force-pushed the gh-3579 branch from 2881471 to 88f96f0 Compare September 24, 2022 12:00

Prevent memory overflows (and spills) on sorts with a fixed limit

c8decca

isidentical force-pushed the gh-3579 branch from 88f96f0 to c8decca Compare September 24, 2022 14:13

isidentical requested a review from Dandandan September 24, 2022 14:13

Dandandan reviewed Sep 24, 2022

View reviewed changes

Dandandan approved these changes Sep 24, 2022

View reviewed changes

Dandandan merged commit 696a0b5 into apache:master Sep 24, 2022

tustvold mentioned this pull request Oct 31, 2022

Update to arrow 26, change timezones #4039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent over-allocations (and spills) on sorts with a fixed limit #3593

Prevent over-allocations (and spills) on sorts with a fixed limit #3593

isidentical commented Sep 22, 2022 •

edited

Loading

alamb commented Sep 24, 2022

Dandandan Sep 24, 2022

Dandandan left a comment

ursabot commented Sep 24, 2022

Prevent over-allocations (and spills) on sorts with a fixed limit #3593

Prevent over-allocations (and spills) on sorts with a fixed limit #3593

Conversation

isidentical commented Sep 22, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb commented Sep 24, 2022

Dandandan Sep 24, 2022

Choose a reason for hiding this comment

Dandandan left a comment

Choose a reason for hiding this comment

ursabot commented Sep 24, 2022

isidentical commented Sep 22, 2022 •

edited

Loading