20067: fix: race condition in SpillPool caused by buffered stream by martin-augment · Pull Request #240 · martin-augment/datafusion

martin-augment · 2026-02-12T07:13:10Z

20067: To review by AI

Signed-off-by: Ahmed hossam <ahmed.hossambahig@gmail.com>

gemini-code-assist · 2026-02-12T07:13:27Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a race condition within the SpillPool by removing a problematic buffered stream mechanism. The SpillManager's batch read buffering capability has been entirely removed, and the MultiLevelMergeBuilder has been updated to reflect this change by no longer requesting read-ahead buffering. A new, robust concurrent test has been added to the SpillPool to ensure the stability and correctness of concurrent writer and reader operations.

Highlights

Race Condition Fix: Eliminated explicit batch buffering in SpillManager to resolve a race condition that could occur with buffered streams.
Buffering Mechanism Removal: Adjusted MultiLevelMergeBuilder to no longer use read-ahead buffering when merging spill files, aligning with the removal of the problematic buffering mechanism.
Concurrency Test: Introduced a new concurrent test case in SpillPool to rigorously verify the fix for writer-reader race conditions under stress.

Changelog

datafusion/physical-plan/src/sorts/multi_level_merge.rs
- Updated get_sorted_spill_files_to_merge to request memory for 1 batch per file instead of 2, removing read-ahead buffering.
- Removed the call to with_batch_read_buffer_capacity when creating spill streams.
datafusion/physical-plan/src/spill/spill_manager.rs
- Removed the spawn_buffered utility import.
- Removed the batch_read_buffer_capacity field from the SpillManager struct.
- Deleted the with_batch_read_buffer_capacity method and its initialization in the new function.
- Modified read_spill_as_stream to return the raw stream directly, without buffering.
datafusion/physical-plan/src/spill/spill_pool.rs
- Added a new test_concurrent_writer_reader_race_condition to stress-test concurrent writes and reads, ensuring no race conditions occur.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-12T07:13:34Z

Walkthrough

This pull request modifies the spill file buffering strategy in DataFusion's physical execution layer. The SpillManager removes its internal batch read buffer capacity configuration, including the private field and public builder method. The read_spill_as_stream function no longer wraps streams with buffering. Additionally, the multi_level_merge path adjusts its buffer length from 2 to 1 when retrieving sorted spill files. A new concurrent stress test is added to validate writer/reader behavior under interleaved execution conditions.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20067-2026-02-12-07-13-06

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-02-12T07:14:49Z

PR Review: Fix Race Condition in SpillPool by Removing Buffered Streams

Summary

This PR addresses a race condition in SpillPool caused by the use of buffered streams (spawn_buffered). The fix removes the buffering layer from spill file reads, simplifying the concurrency model and eliminating the race condition.

Code Quality ✅

Strengths:

Clean removal: The changes properly remove all references to batch_read_buffer_capacity field and the with_batch_read_buffer_capacity method
Consistent updates: All call sites in multi_level_merge.rs have been updated to reflect the removal of buffering
Good test coverage: The new test test_concurrent_writer_reader_race_condition stress-tests the fix with 100 iterations

Minor Issues:

Typo in test comment (line 1447): "concurncy in the reader and the reader" should be "concurrency in the writer and the reader"
Typo in test comment (line 1447): "there is now race condtion" should be "there is no race condition"
Unused variable in multi_level_merge.rs:262: The _ assignment for buffer_size could be clearer about why it's being discarded

Potential Issues ⚠️

1. Performance Impact (Medium Priority)
The removal of spawn_buffered eliminates read-ahead buffering. Previously, with buffer_size = 2, the system could pre-fetch up to 2 batches ahead of consumption. This change means:

Pro: Simpler concurrency model, no race conditions
Con: Potential throughput reduction when reading from disk since there's no read-ahead

Recommendation: Benchmark spill-heavy queries to ensure this doesn't significantly impact performance. The comment change from 2 to 1 in multi_level_merge.rs:264 suggests this was intentional, but performance validation is important.

2. Incomplete Documentation Update
The comment in spill/mod.rs:55 still references using spawn_buffered for buffering:

/// It will read one batch at a time and will not do any buffering, to buffer data use [`crate::common::spawn_buffered`]

This reference is now inconsistent with the new approach.

Recommendation: Update this comment to reflect that buffering is no longer used for spill reads to avoid race conditions.

Test Coverage ✅

Strengths:

The new test is well-designed with:
- Multi-threaded runtime with single worker thread to maximize contention
- 100 iterations to catch rare race conditions
- Concurrent writer/reader with explicit yield points to trigger interleaving
- Clear assertion failure messages

Suggestions:

Consider adding a comment explaining why single worker thread is used (to maximize thread context switches and race condition visibility)
The test uses SpawnedTask - verify this is the correct test utility for this codebase (appears to be defined elsewhere in tests)

Security Concerns ✅

No security issues identified. The changes reduce complexity and potential for concurrency bugs, which is a security improvement.

Performance Considerations ⚠️

Memory Usage: Lower memory footprint since we're not buffering batches in memory
Throughput: Potential reduction in read throughput without read-ahead
Latency: First-batch latency unchanged, but sustained throughput may be affected

Recommendation: Add performance regression tests or benchmark results to validate the trade-off.

Race Condition Analysis 🔍

The race condition likely occurred because spawn_buffered creates a separate task that polls the underlying stream:

Writer writes batch and flushes
Buffered task polls stream but gets Pending
Writer writes another batch
Buffered task might miss the wake-up notification
Reader waits indefinitely despite data being available

Removing buffering eliminates this intermediate layer and ensures the reader directly polls the underlying stream, receiving wake-up notifications correctly.

Recommendations

Must Fix:

✏️ Fix typos in test comments (lines 1447)

Should Consider:

📚 Update documentation in spill/mod.rs:55 to remove spawn_buffered reference
🔍 Add comment explaining why buffer_size is discarded in multi_level_merge.rs
📊 Benchmark spill-heavy workloads to validate performance impact
💭 Consider adding a comment in the test explaining the single worker thread rationale

Nice to Have:

Add a comment explaining what race condition this PR fixes (reference issue fix: race condition in SpillPool caused by buffered stream apache/datafusion#20067 if available)
Consider adding this race condition scenario to documentation as a lesson learned

Verdict: ✅ LGTM with minor suggestions

The core fix is sound and properly addresses the race condition. The test coverage is good. The minor issues noted above don't block merging but should be addressed for code quality and maintainability.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@datafusion/physical-plan/src/spill/spill_pool.rs`:
- Around line 1447-1448: Fix the typos in the test comment that currently reads
"stress testing the concurncy in the reader and the reader to make sure there is
now race condtion going for 100 iterations with a 5 batches per iteration" by
replacing "concurncy" with "concurrency", "now" with "no", and "condtion" with
"condition" so the comment reads correctly about stress testing concurrency and
ensuring no race condition.

🧹 Nitpick comments (2)

datafusion/physical-plan/src/sorts/multi_level_merge.rs (1)

262-269: Change looks correct; the recursive buffer-reduction path in get_sorted_spill_files_to_merge is now dead code.

With buffer_len hardcoded to 1, the if buffer_len > 1 branch (line 382) can never be reached. The function still works correctly—it'll return Err immediately if it can't reserve memory for the minimum streams—but the recursive retry logic is effectively dead. Consider removing it or adding a comment noting it's retained for future flexibility.
datafusion/physical-plan/src/spill/spill_manager.rs (1)
168-193: read_spill_as_stream and read_spill_as_stream_unbuffered are now identical.

After removing spawn_buffered, both methods produce the exact same Box::pin(cooperative(SpillReaderStream::new(...))). The docstring on line 182 ("Same as read_spill_as_stream, but without buffering") is now misleading since neither method buffers. Consider removing read_spill_as_stream_unbuffered and updating callers to use read_spill_as_stream, or updating the doc to clarify they are the same.
#!/bin/bash
# Find all callers of read_spill_as_stream_unbuffered to assess impact of removal
rg -n 'read_spill_as_stream_unbuffered' --type rust

coderabbitai · 2026-02-12T07:16:46Z

datafusion/physical-plan/src/spill/spill_pool.rs

+        // stress testing the concurncy in the reader and the reader to make sure there is now race condtion
+        // going for 100 iterations with a 5 batches per iteration 


⚠️ Potential issue | 🟡 Minor

Typos in test comment.

Line 1447: "concurncy" → "concurrency", "now" → "no", "condtion" → "condition".

Proposed fix

- // stress testing the concurncy in the reader and the reader to make sure there is now race condtion - // going for 100 iterations with a 5 batches per iteration + // Stress testing the concurrency between the writer and the reader to make sure there is no race condition. + // Going for 100 iterations with 5 batches per iteration.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// stress testing the concurncy in the reader and the reader to make sure there is now race condtion

// going for 100 iterations with a 5 batches per iteration

// Stress testing the concurrency between the writer and the reader to make sure there is no race condition.

// Going for 100 iterations with 5 batches per iteration.

🤖 Prompt for AI Agents

In `@datafusion/physical-plan/src/spill/spill_pool.rs` around lines 1447 - 1448, Fix the typos in the test comment that currently reads "stress testing the concurncy in the reader and the reader to make sure there is now race condtion going for 100 iterations with a 5 batches per iteration" by replacing "concurncy" with "concurrency", "now" with "no", and "condtion" with "condition" so the comment reads correctly about stress testing concurrency and ensuring no race condition.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There are few minor typos in the comments which need to be fixed.

gemini-code-assist

Code Review

This pull request addresses a race condition related to buffered streams when reading spilled files. The core change is the removal of read-ahead buffering in SpillManager::read_spill_as_stream, which makes it an unbuffered stream. This change is propagated to MultiLevelMergeBuilder which was using the buffered stream. A new stress test is also added to spill_pool.rs to verify the fix for concurrent writer/reader scenarios and prevent future regressions.

The changes look good and effectively remove the problematic buffering. I have a couple of suggestions for improvement: one is to refactor the now-redundant read_spill_as_stream_unbuffered function, and the other is to fix a comment in the new test.

gemini-code-assist · 2026-02-12T07:16:50Z

datafusion/physical-plan/src/spill/spill_manager.rs

        )));

-        Ok(spawn_buffered(stream, self.batch_read_buffer_capacity))
+        Ok(stream)


Now that read_spill_as_stream is unbuffered, it has become functionally identical to read_spill_as_stream_unbuffered. To reduce code duplication and improve clarity, consider removing read_spill_as_stream_unbuffered and updating its call sites to use this function instead.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! After removing the buffering from read_spill_as_stream() it is not exactly the same as read_spill_as_stream_unbuffered(). If the fix of the initial issue is to drop the pre-fetching then one of these methods should be removed and all its callers should use the other method. But maybe this is the wrong way to fix it. Maybe a better way is to find the reason why a reader gets an invalid EOF/None message and does not wait for more messages.

gemini-code-assist · 2026-02-12T07:16:50Z

datafusion/physical-plan/src/spill/spill_pool.rs

+        // stress testing the concurncy in the reader and the reader to make sure there is now race condtion
+        // going for 100 iterations with a 5 batches per iteration 


The comment contains a few typos and could be clarified. It seems to be testing concurrency between a writer and a reader, not two readers.

Suggested change

// stress testing the concurncy in the reader and the reader to make sure there is now race condtion

// going for 100 iterations with a 5 batches per iteration

// Stress test for concurrency between the writer and reader to ensure there is no race condition.

// It runs for 100 iterations, with 5 batches per iteration.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There are few minor typos in the comments which need to be fixed.

augmentcode · 2026-02-12T07:18:53Z

🤖 Augment PR Summary

Summary: This PR addresses a race condition in spill reading/merging by removing read-ahead buffering from spill streams.

Changes:

Stops using buffered spill streams in multi-level merge by reserving only 1 batch per spill file and reading spills directly as streams
Simplifies SpillManager::read_spill_as_stream to return the cooperative SpillReaderStream without wrapping it in a buffered task
Removes the configurable batch read buffer capacity from SpillManager
Adds a stress test that repeatedly runs concurrent spill writer/reader to catch the prior race

Technical Notes: The change reduces in-memory read-ahead (previously via buffered spawning) to avoid concurrency issues between spill file lifecycle and buffered reads.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-02-12T07:18:54Z

datafusion/physical-plan/src/spill/spill_manager.rs

+        Ok(stream)
    }

    /// Same as `read_spill_as_stream`, but without buffering.


After removing spawn_buffered, read_spill_as_stream is also unbuffered, so the doc comment on read_spill_as_stream_unbuffered is now misleading (there’s no behavioral difference). This could confuse callers/tests that expect a distinction between the two APIs.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! After removing the buffering from read_spill_as_stream() it is not exactly the same as read_spill_as_stream_unbuffered(). If the fix of the initial issue is to drop the pre-fetching then one of these methods should be removed and all its callers should use the other method. But maybe this is the wrong way to fix it. Maybe a better way is to find the reason why a reader gets an invalid EOF/None message and does not wait for more messages.

augmentcode · 2026-02-12T07:18:54Z

datafusion/physical-plan/src/spill/spill_pool.rs

+
+            let reader_handle = SpawnedTask::spawn(async move {
+                let mut batches_read = 0;
+                while let Some(result) = reader.next().await {


This while let Some(...) loop relies on the writer being dropped to terminate; if the stream fails to close for any reason, the test can hang indefinitely and stall CI. That makes failures show up as timeouts rather than a clear assertion failure.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:annoying; category:bug; feedback: The Augment AI reviewer is not correct! Depending on the writer to drop is a standard way to decide to stop waiting on read operation for all kinds of channels in Rust. The first/main reason is when receiving None message and the last dropped writer is the second reason.

martin-augment · 2026-02-12T09:44:43Z

168-193: read_spill_as_stream and read_spill_as_stream_unbuffered are now identical.

After removing spawn_buffered, both methods produce the exact same Box::pin(cooperative(SpillReaderStream::new(...))). The docstring on line 182 ("Same as read_spill_as_stream, but without buffering") is now misleading since neither method buffers. Consider removing read_spill_as_stream_unbuffered and updating callers to use read_spill_as_stream, or updating the doc to clarify they are the same.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! After removing the buffering from read_spill_as_stream() it is not exactly the same as read_spill_as_stream_unbuffered(). If the fix of the initial issue is to drop the pre-fetching then one of these methods should be removed and all its callers should use the other method. But maybe this is the wrong way to fix it. Maybe a better way is to find the reason why a reader gets an invalid EOF/None message and does not wait for more messages.

martin-augment · 2026-02-12T09:46:15Z

Typo in test comment (line 1447): "concurncy in the reader and the reader" should be "concurrency in the writer and the reader"

Typo in test comment (line 1447): "there is now race condtion" should be "there is no race condition"

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There are few minor typos in the comments which need to be fixed.

dekuu5 added 7 commits January 29, 2026 20:10

fix: race condition in SpillPool caused by buffered stream

ccde430

Signed-off-by: Ahmed hossam <ahmed.hossambahig@gmail.com>

Merge branch 'main' into test_SpillPool

1b8ef43

Merge branch 'apache:main' into test_SpillPool

102c121

added tests

186b699

Signed-off-by: Ahmed hossam <ahmed.hossambahig@gmail.com>

removed batch_read_buffer_capacity no longer needed

6139690

Signed-off-by: Ahmed hossam <ahmed.hossambahig@gmail.com>

Merge branch 'main' into test_SpillPool

89ebeec

Merge branch 'main' into test_SpillPool

21d53a2

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

augmentcode bot reviewed Feb 12, 2026

View reviewed changes

		// stress testing the concurncy in the reader and the reader to make sure there is now race condtion
		// going for 100 iterations with a 5 batches per iteration

Comments

Conversation

martin-augment commented Feb 12, 2026

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Feb 12, 2026

PR Review: Fix Race Condition in SpillPool by Removing Buffered Streams

Summary

Code Quality ✅

Potential Issues ⚠️

Test Coverage ✅

Security Concerns ✅

Performance Considerations ⚠️

Race Condition Analysis 🔍

Recommendations

Verdict: ✅ LGTM with minor suggestions

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Feb 12, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Feb 12, 2026

Uh oh!

martin-augment commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 12, 2026 •

edited

Loading