Account for memory usage in SortPreservingMerge (#5885) #7130

alamb · 2023-07-28T16:55:48Z

Which issue does this PR close?

Closes #5885
Closes #6382 (an earlier version of this code)

Rationale for this change

Merging takes memory. It is most pronounced for dictionaries where the RowConverter actually interns the dictionary values and thus can take an appreciable amount of memory for high cardinality dictionaries. We have seen this be 10s of GB in certain IOx cases.

Thus it is important that the streaming_merge and things that use it, like Sort and SortPreservingMerge properly account for the memory used while merging.

What changes are included in this PR?

This is based on the changes from @tustvold in #6382:

Thread a MemoryReservation through to streaming_merge
Adds two new config parameters sort_spill_reservation_bytes and sort_in_place_threshold_bytes that control the level of spilling
Adds new tests

There is some subtlety related to reserving memory for this merge up front when doing a spilling Sort, which I describe inline in comments

Are these changes tested?

Yes

Are there any user-facing changes?

If memory limits are configured, some plans will now error rather than exceed that limit.

alamb · 2023-08-02T16:56:45Z

datafusion/core/tests/memory_limit.rs

+}
+
+#[tokio::test]
+async fn sort_spill_reservation() {


This test demonstrates why the sort_spill_reservation_bytes is needed -- if it is insufficiently large, spilling may fail (because it runs out of memory when trying to write to the spill file). If someone hits this they can increase the value of the memory reserved for merge

alamb · 2023-08-02T17:00:07Z

datafusion/core/tests/memory_limit.rs

-use datafusion::physical_plan::SendableRecordBatchStream;
-use datafusion_common::assert_contains;
+use datafusion::physical_plan::{ExecutionPlan, SendableRecordBatchStream};
+use datafusion_common::{assert_contains, Result};

 use datafusion::prelude::{SessionConfig, SessionContext};
 use datafusion_execution::TaskContext;


The tests in this file can be cleaned up significantly, but I will do so as a follow on PR to keep the size of this one down

alamb · 2023-08-02T17:01:32Z

datafusion/core/tests/memory_limit.rs

 }

 impl TestCase {
+    // TODO remove expected errors and memory limits and query from constructor


I will do this as a follow on PR

datafusion/common/src/config.rs

alamb · 2023-08-02T18:44:36Z

datafusion/common/src/config.rs

+        /// When sorting, below what size should data be concatenated
+        /// and sorted in a single RecordBatch rather than sorted in
+        /// batches and merged.
+        pub sort_in_place_threshold_bytes: usize, default = 1024 * 1024


this is not a behavior change. This constant was hard coded in sort.rs -- I have just pulled it out into its own config setting so I can write tests

alamb · 2023-08-03T18:13:44Z

datafusion/core/src/physical_plan/sorts/stream.rs

@@ -84,13 +85,16 @@ pub struct RowCursorStream {
    column_expressions: Vec<Arc<dyn PhysicalExpr>>,
    /// Input streams
    streams: FusedStreams,
+    /// Tracks the memory used by `converter`
+    reservation: MemoryReservation,


We observed this to be a key consumer of memory for large dictionary encoded data

alamb · 2023-08-03T19:17:58Z

datafusion/core/tests/memory_limit.rs

+        // enough memory to sort if we don't try to merge it all at once
+        (partition_size * 5) / 2,
+    )
+        // use a single partiton so only a sort is needed


This test demonstrates the need for reserving memory up front for the spill -- and shows that if someone hits the error they can increased the memory set aside for the merge and it will work

alamb · 2023-08-04T10:16:27Z

I ran the sort benchmarks and they are basically the same. I think the 9% slower measure is due to a high variance in the benchmark (which I should look into if/when I have time). I saw similar variations when I compared main to itself

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_sort-merge-accounting ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Qsort utf8   │ 60974.35ms │                  66454.87ms │ 1.09x slower │
│ Qsort int    │ 78390.32ms │                  82235.04ms │    no change │
│ Qsort        │ 66029.57ms │                  65439.66ms │    no change │
│ decimal      │            │                             │              │
│ Qsort        │ 84864.85ms │                  87846.28ms │    no change │
│ integer      │            │                             │              │
│ tuple        │            │                             │              │
│ Qsort utf8   │ 62470.81ms │                  63878.11ms │    no change │
│ tuple        │            │                             │              │
│ Qsort mixed  │ 72625.32ms │                  74784.97ms │    no change │
│ tuple        │            │                             │              │
└──────────────┴────────────┴─────────────────────────────┴──────────────┘

alamb · 2023-08-04T13:22:05Z

I have also tested this on our code upstream and it definitely helps account for some of the difference between tracked and actual memory.

…unting

yjshen · 2023-08-08T15:35:12Z

I'll review this carefully today.

yjshen

Looks great to me! Thanks @alamb!

yjshen · 2023-08-09T01:34:20Z

datafusion/core/tests/memory_limit.rs

 use datafusion::physical_optimizer::PhysicalOptimizerRule;
+use datafusion::physical_plan::common::batch_byte_size;


We would probably remove this method and use RecordBatch::get_array_memory_size in the repo.

That is a good idea -- I will do so in a follow on PR

Update: #7245

yjshen · 2023-08-09T01:36:15Z

datafusion/core/src/physical_plan/sorts/sort.rs

-    /// A handle to the runtime to get Disk spill files
+    /// Reservation for the merging of in-memory batches. If the sort
+    /// might spill, `sort_spill_reservation_bytes` will be
+    /// pre-reserved to ensure there is some space for this sort/merg.


Suggested change

/// pre-reserved to ensure there is some space for this sort/merg.

/// pre-reserved to ensure there is some space for this sort/merge.

in f87705e

yjshen · 2023-08-09T01:38:04Z

datafusion/core/src/physical_plan/sorts/sort.rs

+        // Release the memory reserved for merge back to the pool so
+        // there is some left when `in_memo_sort_stream` requests an
+        // allocation.
+        self.merge_reservation.free();


yjshen · 2023-08-09T01:52:02Z

datafusion/common/src/config.rs

+        /// How much memory is set aside, for each spillable sort, to
+        /// ensure an in-memory merge can occur. This setting has no
+        /// if the sort can not spill (there is no `DiskManager`
+        /// configured)
+        ///
+        /// As part of spilling to disk, in memory data must be sorted
+        /// / merged before writing the file. This in-memory
+        /// sort/merge requires memory as well, so To avoid allocating
+        /// once memory is exhausted, DataFusion sets aside this
+        /// many bytes before.


Maybe:

/// Specifies the reserved memory for each spillable sort operation to /// facilitate an in-memory merge. /// /// When a sort operation spills to disk, the in-memory data must be /// sorted and merged before being written to a file. This setting reserves /// a specific amount of memory for that in-memory sort/merge process. /// /// Note: This setting is irrelevant if the sort operation cannot spill /// (i.e., if there's no `DiskManager` configured).

That is a much better wording. Thank you @yjshen -- in f87705e

yjshen · 2023-08-09T01:52:58Z

datafusion/core/src/physical_plan/sorts/sort.rs

        }

        let streams = std::mem::take(&mut self.in_mem_batches)
            .into_iter()
            .map(|batch| {
                let metrics = self.metrics.baseline.intermediate();
-                Ok(spawn_buffered(self.sort_batch_stream(batch, metrics)?, 1))
+                let reservation = self.reservation.split(batch.get_array_memory_size());


crepererum

Looks good to me (modulo what @yjshen already said).

…unting

alamb

Thanks for the review @yjshen and @crepererum

alamb · 2023-08-09T13:45:24Z

datafusion/core/tests/memory_limit.rs

 use datafusion::physical_optimizer::PhysicalOptimizerRule;
+use datafusion::physical_plan::common::batch_byte_size;


That is a good idea -- I will do so in a follow on PR

Update: #7245

alamb · 2023-08-09T13:49:59Z

datafusion/common/src/config.rs

+        /// How much memory is set aside, for each spillable sort, to
+        /// ensure an in-memory merge can occur. This setting has no
+        /// if the sort can not spill (there is no `DiskManager`
+        /// configured)
+        ///
+        /// As part of spilling to disk, in memory data must be sorted
+        /// / merged before writing the file. This in-memory
+        /// sort/merge requires memory as well, so To avoid allocating
+        /// once memory is exhausted, DataFusion sets aside this
+        /// many bytes before.


That is a much better wording. Thank you @yjshen -- in f87705e

alamb · 2023-08-09T13:50:17Z

datafusion/core/src/physical_plan/sorts/sort.rs

-    /// A handle to the runtime to get Disk spill files
+    /// Reservation for the merging of in-memory batches. If the sort
+    /// might spill, `sort_spill_reservation_bytes` will be
+    /// pre-reserved to ensure there is some space for this sort/merg.


in f87705e

alamb · 2023-08-09T13:51:22Z

@gruuya I am quite confident that this PR will conflict with #7180 -- is it ok if I merge this one before #7180?

gruuya · 2023-08-09T15:34:29Z

@gruuya I am quite confident that this PR will conflict with #7180 -- is it ok if I merge this one before #7180?

Oh definitely, please go ahead. I treat #7180 more as an experiment at this point, cheers.

yjshen · 2023-08-09T16:13:25Z

Great, let's merge this! Thanks @alamb @crepererum @gruuya!

alamb · 2023-08-09T16:55:30Z

Thanks @yjshen and @gruuya

github-actions bot added the core Core DataFusion crate label Jul 28, 2023

This was referenced Jul 28, 2023

Refactor memory_limit tests to make them easier to extend #7131

Merged

Account for memory usage in SortPreservingMerge (#5885) #6382

Closed

alamb force-pushed the alamb/sort-merge-accounting branch 2 times, most recently from dc9c2b6 to f5019c9 Compare August 2, 2023 17:57

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 2, 2023

alamb mentioned this pull request Aug 2, 2023

Add MemoryReservation::{split_off, take, new_empty} #7184

Merged

alamb commented Aug 2, 2023

View reviewed changes

alamb force-pushed the alamb/sort-merge-accounting branch from f17afb0 to 9dbee0b Compare August 2, 2023 18:35

alamb commented Aug 2, 2023

View reviewed changes

alamb mentioned this pull request Aug 3, 2023

Top-K eager batch sorting #7180

Closed

alamb force-pushed the alamb/sort-merge-accounting branch from 9dbee0b to ac0aea2 Compare August 3, 2023 17:57

alamb commented Aug 3, 2023

View reviewed changes

Account for memory usage in SortPreservingMerge

3505dba

alamb force-pushed the alamb/sort-merge-accounting branch from ac0aea2 to 3505dba Compare August 3, 2023 18:43

alamb mentioned this pull request Aug 3, 2023

Minor: make memory_limit tests more self describing #7190

Merged

alamb commented Aug 3, 2023

View reviewed changes

alamb marked this pull request as ready for review August 4, 2023 13:21

alamb mentioned this pull request Aug 4, 2023

RowInterner::size() much too low for high cardinality dictionary columns apache/arrow-rs#4645

Closed

Merge remote-tracking branch 'apache/main' into alamb/sort-merge-acco…

741eca3

…unting

yjshen approved these changes Aug 9, 2023

View reviewed changes

crepererum approved these changes Aug 9, 2023

View reviewed changes

alamb added 3 commits August 9, 2023 09:46

Merge remote-tracking branch 'apache/main' into alamb/sort-merge-acco…

adea2c4

…unting

Review Comments: Improve documentation and comments

f87705e

Review Comments: Improve documentation and comments

441bcbc

alamb commented Aug 9, 2023

View reviewed changes

alamb mentioned this pull request Aug 9, 2023

Deprecate batch_byte_size #7245

Merged

yjshen merged commit 161c6d3 into apache:main Aug 9, 2023

alamb deleted the alamb/sort-merge-accounting branch August 9, 2023 16:55

alamb mentioned this pull request Aug 9, 2023

Minor: fix clippy for memory_limit test #7248

Merged

alamb mentioned this pull request Sep 13, 2023

Stateless Row Conversion apache/arrow-rs#4811

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for memory usage in SortPreservingMerge (#5885) #7130

Account for memory usage in SortPreservingMerge (#5885) #7130

alamb commented Jul 28, 2023 •

edited

Loading

alamb Aug 2, 2023

alamb Aug 2, 2023

alamb Aug 2, 2023

alamb Aug 3, 2023

alamb Aug 2, 2023

alamb Aug 3, 2023

alamb Aug 3, 2023

alamb commented Aug 4, 2023

alamb commented Aug 4, 2023

yjshen commented Aug 8, 2023

yjshen left a comment

yjshen Aug 9, 2023

alamb Aug 9, 2023 •

edited

Loading

yjshen Aug 9, 2023

alamb Aug 9, 2023

yjshen Aug 9, 2023

yjshen Aug 9, 2023

alamb Aug 9, 2023

yjshen Aug 9, 2023

crepererum left a comment

alamb left a comment

alamb Aug 9, 2023 •

edited

Loading

alamb Aug 9, 2023

alamb Aug 9, 2023

alamb commented Aug 9, 2023

gruuya commented Aug 9, 2023

yjshen commented Aug 9, 2023

alamb commented Aug 9, 2023

		use datafusion::physical_optimizer::PhysicalOptimizerRule;
		use datafusion::physical_plan::common::batch_byte_size;

	/// pre-reserved to ensure there is some space for this sort/merg.
	/// pre-reserved to ensure there is some space for this sort/merge.

Account for memory usage in SortPreservingMerge (#5885) #7130

Account for memory usage in SortPreservingMerge (#5885) #7130

Conversation

alamb commented Jul 28, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Aug 4, 2023

alamb commented Aug 4, 2023

yjshen commented Aug 8, 2023

yjshen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crepererum left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Aug 9, 2023

gruuya commented Aug 9, 2023

yjshen commented Aug 9, 2023

alamb commented Aug 9, 2023

alamb commented Jul 28, 2023 •

edited

Loading

alamb Aug 9, 2023 •

edited

Loading

alamb Aug 9, 2023 •

edited

Loading