Skip to content

Conversation

@srinathk10
Copy link
Contributor

@srinathk10 srinathk10 commented Nov 18, 2025

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

[Data] Add disable Block Shaping option to BlockOutputBuffer

In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
@srinathk10 srinathk10 requested a review from a team as a code owner November 18, 2025 22:11
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a disable_block_shaping option to BlockOutputBuffer, allowing users to bypass the automatic block sizing logic. This is a useful feature for scenarios where manual control over block structure is desired. The implementation is clean, involving updates to OutputBlockSizeOption and BlockOutputBuffer to respect the new flag. The related refactoring in map_transformer.py to remove the now-obsolete _can_skip_block_sizing method is a good simplification. The accompanying tests are thorough, covering various configurations and ensuring the new functionality works as expected. I've suggested a couple of minor refactorings to improve code conciseness by removing some redundant checks. Overall, this is a solid contribution.

Comment on lines 108 to 115
def _exceeded_buffer_row_limit(self) -> bool:
if self._output_block_size_option.disable_block_shaping:
return False

return (
self._max_num_rows_per_block() is not None
and self._buffer.num_rows() > self._max_num_rows_per_block()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The if self._output_block_size_option.disable_block_shaping: check is redundant. The _max_num_rows_per_block() method already returns None when disable_block_shaping is True, which causes the expression self._max_num_rows_per_block() is not None to evaluate to False. This makes the entire return statement False, achieving the same result as the explicit if block. Removing the explicit check simplifies the code.

    def _exceeded_buffer_row_limit(self) -> bool:
        return (
            self._max_num_rows_per_block() is not None
            and self._buffer.num_rows() > self._max_num_rows_per_block()
        )

Comment on lines 117 to 124
def _exceeded_buffer_size_limit(self) -> bool:
if self._output_block_size_option.disable_block_shaping:
return False

return (
self._max_bytes_per_block() is not None
and self._buffer.get_estimated_memory_usage() > self._max_bytes_per_block()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to _exceeded_buffer_row_limit, the if self._output_block_size_option.disable_block_shaping: check here is redundant. The _max_bytes_per_block() method returns None when disable_block_shaping is True, which will cause this method to correctly return False. Removing the explicit check will make the code more concise.

    def _exceeded_buffer_size_limit(self) -> bool:
        return (
            self._max_bytes_per_block() is not None
            and self._buffer.get_estimated_memory_usage() > self._max_bytes_per_block()
        )

@srinathk10 srinathk10 added the go add ONLY when ready to merge, run all tests label Nov 18, 2025
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Nov 19, 2025
@bveeramani bveeramani merged commit 641d16f into master Nov 19, 2025
7 checks passed
@bveeramani bveeramani deleted the srinathk10/output_buffer_disablng branch November 19, 2025 20:16
400Ping pushed a commit to 400Ping/ray that referenced this pull request Nov 21, 2025
…ject#58757)

## Description

In addition to Block shaping by Block Size and Num Rows, add an option
to skip Block Shaping altogether in BlockOutputBuffer.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ject#58757)

## Description

In addition to Block shaping by Block Size and Num Rows, add an option
to skip Block Shaping altogether in BlockOutputBuffer.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ject#58757)

## Description

In addition to Block shaping by Block Size and Num Rows, add an option
to skip Block Shaping altogether in BlockOutputBuffer.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants