Skip to content

Comments

20067: fix: race condition in SpillPool caused by buffered stream#240

Open
martin-augment wants to merge 7 commits intomainfrom
pr-20067-2026-02-12-07-13-06
Open

20067: fix: race condition in SpillPool caused by buffered stream#240
martin-augment wants to merge 7 commits intomainfrom
pr-20067-2026-02-12-07-13-06

Conversation

@martin-augment
Copy link
Owner

20067: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a race condition within the SpillPool by removing a problematic buffered stream mechanism. The SpillManager's batch read buffering capability has been entirely removed, and the MultiLevelMergeBuilder has been updated to reflect this change by no longer requesting read-ahead buffering. A new, robust concurrent test has been added to the SpillPool to ensure the stability and correctness of concurrent writer and reader operations.

Highlights

  • Race Condition Fix: Eliminated explicit batch buffering in SpillManager to resolve a race condition that could occur with buffered streams.
  • Buffering Mechanism Removal: Adjusted MultiLevelMergeBuilder to no longer use read-ahead buffering when merging spill files, aligning with the removal of the problematic buffering mechanism.
  • Concurrency Test: Introduced a new concurrent test case in SpillPool to rigorously verify the fix for writer-reader race conditions under stress.
Changelog
  • datafusion/physical-plan/src/sorts/multi_level_merge.rs
    • Updated get_sorted_spill_files_to_merge to request memory for 1 batch per file instead of 2, removing read-ahead buffering.
    • Removed the call to with_batch_read_buffer_capacity when creating spill streams.
  • datafusion/physical-plan/src/spill/spill_manager.rs
    • Removed the spawn_buffered utility import.
    • Removed the batch_read_buffer_capacity field from the SpillManager struct.
    • Deleted the with_batch_read_buffer_capacity method and its initialization in the new function.
    • Modified read_spill_as_stream to return the raw stream directly, without buffering.
  • datafusion/physical-plan/src/spill/spill_pool.rs
    • Added a new test_concurrent_writer_reader_race_condition to stress-test concurrent writes and reads, ensuring no race conditions occur.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Walkthrough

This pull request modifies the spill file buffering strategy in DataFusion's physical execution layer. The SpillManager removes its internal batch read buffer capacity configuration, including the private field and public builder method. The read_spill_as_stream function no longer wraps streams with buffering. Additionally, the multi_level_merge path adjusts its buffer length from 2 to 1 when retrieving sorted spill files. A new concurrent stress test is added to validate writer/reader behavior under interleaved execution conditions.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20067-2026-02-12-07-13-06

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Feb 12, 2026

PR Review: Fix Race Condition in SpillPool by Removing Buffered Streams

Summary

This PR addresses a race condition in SpillPool caused by the use of buffered streams (spawn_buffered). The fix removes the buffering layer from spill file reads, simplifying the concurrency model and eliminating the race condition.


Code Quality ✅

Strengths:

  • Clean removal: The changes properly remove all references to batch_read_buffer_capacity field and the with_batch_read_buffer_capacity method
  • Consistent updates: All call sites in multi_level_merge.rs have been updated to reflect the removal of buffering
  • Good test coverage: The new test test_concurrent_writer_reader_race_condition stress-tests the fix with 100 iterations

Minor Issues:

  1. Typo in test comment (line 1447): "concurncy in the reader and the reader" should be "concurrency in the writer and the reader"
  2. Typo in test comment (line 1447): "there is now race condtion" should be "there is no race condition"
  3. Unused variable in multi_level_merge.rs:262: The _ assignment for buffer_size could be clearer about why it's being discarded

Potential Issues ⚠️

1. Performance Impact (Medium Priority)
The removal of spawn_buffered eliminates read-ahead buffering. Previously, with buffer_size = 2, the system could pre-fetch up to 2 batches ahead of consumption. This change means:

  • Pro: Simpler concurrency model, no race conditions
  • Con: Potential throughput reduction when reading from disk since there's no read-ahead

Recommendation: Benchmark spill-heavy queries to ensure this doesn't significantly impact performance. The comment change from 2 to 1 in multi_level_merge.rs:264 suggests this was intentional, but performance validation is important.

2. Incomplete Documentation Update
The comment in spill/mod.rs:55 still references using spawn_buffered for buffering:

/// It will read one batch at a time and will not do any buffering, to buffer data use [`crate::common::spawn_buffered`]

This reference is now inconsistent with the new approach.

Recommendation: Update this comment to reflect that buffering is no longer used for spill reads to avoid race conditions.


Test Coverage ✅

Strengths:

  • The new test is well-designed with:
    • Multi-threaded runtime with single worker thread to maximize contention
    • 100 iterations to catch rare race conditions
    • Concurrent writer/reader with explicit yield points to trigger interleaving
    • Clear assertion failure messages

Suggestions:

  1. Consider adding a comment explaining why single worker thread is used (to maximize thread context switches and race condition visibility)
  2. The test uses SpawnedTask - verify this is the correct test utility for this codebase (appears to be defined elsewhere in tests)

Security Concerns ✅

No security issues identified. The changes reduce complexity and potential for concurrency bugs, which is a security improvement.


Performance Considerations ⚠️

Memory Usage: Lower memory footprint since we're not buffering batches in memory
Throughput: Potential reduction in read throughput without read-ahead
Latency: First-batch latency unchanged, but sustained throughput may be affected

Recommendation: Add performance regression tests or benchmark results to validate the trade-off.


Race Condition Analysis 🔍

The race condition likely occurred because spawn_buffered creates a separate task that polls the underlying stream:

  1. Writer writes batch and flushes
  2. Buffered task polls stream but gets Pending
  3. Writer writes another batch
  4. Buffered task might miss the wake-up notification
  5. Reader waits indefinitely despite data being available

Removing buffering eliminates this intermediate layer and ensures the reader directly polls the underlying stream, receiving wake-up notifications correctly.


Recommendations

Must Fix:

  1. ✏️ Fix typos in test comments (lines 1447)

Should Consider:

  1. 📚 Update documentation in spill/mod.rs:55 to remove spawn_buffered reference
  2. 🔍 Add comment explaining why buffer_size is discarded in multi_level_merge.rs
  3. 📊 Benchmark spill-heavy workloads to validate performance impact
  4. 💭 Consider adding a comment in the test explaining the single worker thread rationale

Nice to Have:

  1. Add a comment explaining what race condition this PR fixes (reference issue fix: race condition in SpillPool caused by buffered stream apache/datafusion#20067 if available)
  2. Consider adding this race condition scenario to documentation as a lesson learned

Verdict: ✅ LGTM with minor suggestions

The core fix is sound and properly addresses the race condition. The test coverage is good. The minor issues noted above don't block merging but should be addressed for code quality and maintainability.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@datafusion/physical-plan/src/spill/spill_pool.rs`:
- Around line 1447-1448: Fix the typos in the test comment that currently reads
"stress testing the concurncy in the reader and the reader to make sure there is
now race condtion going for 100 iterations with a 5 batches per iteration" by
replacing "concurncy" with "concurrency", "now" with "no", and "condtion" with
"condition" so the comment reads correctly about stress testing concurrency and
ensuring no race condition.
🧹 Nitpick comments (2)
datafusion/physical-plan/src/sorts/multi_level_merge.rs (1)

262-269: Change looks correct; the recursive buffer-reduction path in get_sorted_spill_files_to_merge is now dead code.

With buffer_len hardcoded to 1, the if buffer_len > 1 branch (line 382) can never be reached. The function still works correctly—it'll return Err immediately if it can't reserve memory for the minimum streams—but the recursive retry logic is effectively dead. Consider removing it or adding a comment noting it's retained for future flexibility.

datafusion/physical-plan/src/spill/spill_manager.rs (1)

168-193: read_spill_as_stream and read_spill_as_stream_unbuffered are now identical.

After removing spawn_buffered, both methods produce the exact same Box::pin(cooperative(SpillReaderStream::new(...))). The docstring on line 182 ("Same as read_spill_as_stream, but without buffering") is now misleading since neither method buffers. Consider removing read_spill_as_stream_unbuffered and updating callers to use read_spill_as_stream, or updating the doc to clarify they are the same.

#!/bin/bash
# Find all callers of read_spill_as_stream_unbuffered to assess impact of removal
rg -n 'read_spill_as_stream_unbuffered' --type rust

Comment on lines +1447 to +1448
// stress testing the concurncy in the reader and the reader to make sure there is now race condtion
// going for 100 iterations with a 5 batches per iteration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typos in test comment.

Line 1447: "concurncy" → "concurrency", "now" → "no", "condtion" → "condition".

Proposed fix
-        // stress testing the concurncy in the reader and the reader to make sure there is now race condtion
-        // going for 100 iterations with a 5 batches per iteration 
+        // Stress testing the concurrency between the writer and the reader to make sure there is no race condition.
+        // Going for 100 iterations with 5 batches per iteration.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// stress testing the concurncy in the reader and the reader to make sure there is now race condtion
// going for 100 iterations with a 5 batches per iteration
// Stress testing the concurrency between the writer and the reader to make sure there is no race condition.
// Going for 100 iterations with 5 batches per iteration.
🤖 Prompt for AI Agents
In `@datafusion/physical-plan/src/spill/spill_pool.rs` around lines 1447 - 1448,
Fix the typos in the test comment that currently reads "stress testing the
concurncy in the reader and the reader to make sure there is now race condtion
going for 100 iterations with a 5 batches per iteration" by replacing
"concurncy" with "concurrency", "now" with "no", and "condtion" with "condition"
so the comment reads correctly about stress testing concurrency and ensuring no
race condition.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There are few minor typos in the comments which need to be fixed.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition related to buffered streams when reading spilled files. The core change is the removal of read-ahead buffering in SpillManager::read_spill_as_stream, which makes it an unbuffered stream. This change is propagated to MultiLevelMergeBuilder which was using the buffered stream. A new stress test is also added to spill_pool.rs to verify the fix for concurrent writer/reader scenarios and prevent future regressions.

The changes look good and effectively remove the problematic buffering. I have a couple of suggestions for improvement: one is to refactor the now-redundant read_spill_as_stream_unbuffered function, and the other is to fix a comment in the new test.

)));

Ok(spawn_buffered(stream, self.batch_read_buffer_capacity))
Ok(stream)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Now that read_spill_as_stream is unbuffered, it has become functionally identical to read_spill_as_stream_unbuffered. To reduce code duplication and improve clarity, consider removing read_spill_as_stream_unbuffered and updating its call sites to use this function instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! After removing the buffering from read_spill_as_stream() it is not exactly the same as read_spill_as_stream_unbuffered(). If the fix of the initial issue is to drop the pre-fetching then one of these methods should be removed and all its callers should use the other method. But maybe this is the wrong way to fix it. Maybe a better way is to find the reason why a reader gets an invalid EOF/None message and does not wait for more messages.

Comment on lines +1447 to +1448
// stress testing the concurncy in the reader and the reader to make sure there is now race condtion
// going for 100 iterations with a 5 batches per iteration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment contains a few typos and could be clarified. It seems to be testing concurrency between a writer and a reader, not two readers.

Suggested change
// stress testing the concurncy in the reader and the reader to make sure there is now race condtion
// going for 100 iterations with a 5 batches per iteration
// Stress test for concurrency between the writer and reader to ensure there is no race condition.
// It runs for 100 iterations, with 5 batches per iteration.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There are few minor typos in the comments which need to be fixed.

@augmentcode
Copy link

augmentcode bot commented Feb 12, 2026

🤖 Augment PR Summary

Summary: This PR addresses a race condition in spill reading/merging by removing read-ahead buffering from spill streams.

Changes:

  • Stops using buffered spill streams in multi-level merge by reserving only 1 batch per spill file and reading spills directly as streams
  • Simplifies SpillManager::read_spill_as_stream to return the cooperative SpillReaderStream without wrapping it in a buffered task
  • Removes the configurable batch read buffer capacity from SpillManager
  • Adds a stress test that repeatedly runs concurrent spill writer/reader to catch the prior race

Technical Notes: The change reduces in-memory read-ahead (previously via buffered spawning) to avoid concurrency issues between spill file lifecycle and buffered reads.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Ok(stream)
}

/// Same as `read_spill_as_stream`, but without buffering.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing spawn_buffered, read_spill_as_stream is also unbuffered, so the doc comment on read_spill_as_stream_unbuffered is now misleading (there’s no behavioral difference). This could confuse callers/tests that expect a distinction between the two APIs.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! After removing the buffering from read_spill_as_stream() it is not exactly the same as read_spill_as_stream_unbuffered(). If the fix of the initial issue is to drop the pre-fetching then one of these methods should be removed and all its callers should use the other method. But maybe this is the wrong way to fix it. Maybe a better way is to find the reason why a reader gets an invalid EOF/None message and does not wait for more messages.


let reader_handle = SpawnedTask::spawn(async move {
let mut batches_read = 0;
while let Some(result) = reader.next().await {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This while let Some(...) loop relies on the writer being dropped to terminate; if the stream fails to close for any reason, the test can hang indefinitely and stall CI. That makes failures show up as timeouts rather than a clear assertion failure.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Augment AI reviewer is not correct! Depending on the writer to drop is a standard way to decide to stop waiting on read operation for all kinds of channels in Rust. The first/main reason is when receiving None message and the last dropped writer is the second reason.

@martin-augment
Copy link
Owner Author

168-193: read_spill_as_stream and read_spill_as_stream_unbuffered are now identical.

After removing spawn_buffered, both methods produce the exact same Box::pin(cooperative(SpillReaderStream::new(...))). The docstring on line 182 ("Same as read_spill_as_stream, but without buffering") is now misleading since neither method buffers. Consider removing read_spill_as_stream_unbuffered and updating callers to use read_spill_as_stream, or updating the doc to clarify they are the same.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! After removing the buffering from read_spill_as_stream() it is not exactly the same as read_spill_as_stream_unbuffered(). If the fix of the initial issue is to drop the pre-fetching then one of these methods should be removed and all its callers should use the other method. But maybe this is the wrong way to fix it. Maybe a better way is to find the reason why a reader gets an invalid EOF/None message and does not wait for more messages.

@martin-augment
Copy link
Owner Author

  • Typo in test comment (line 1447): "concurncy in the reader and the reader" should be "concurrency in the writer and the reader"
  • Typo in test comment (line 1447): "there is now race condtion" should be "there is no race condition"

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! There are few minor typos in the comments which need to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants