-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Add disable Block Shaping option to BlockOutputBuffer #58757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a disable_block_shaping option to BlockOutputBuffer, allowing users to bypass the automatic block sizing logic. This is a useful feature for scenarios where manual control over block structure is desired. The implementation is clean, involving updates to OutputBlockSizeOption and BlockOutputBuffer to respect the new flag. The related refactoring in map_transformer.py to remove the now-obsolete _can_skip_block_sizing method is a good simplification. The accompanying tests are thorough, covering various configurations and ensuring the new functionality works as expected. I've suggested a couple of minor refactorings to improve code conciseness by removing some redundant checks. Overall, this is a solid contribution.
| def _exceeded_buffer_row_limit(self) -> bool: | ||
| if self._output_block_size_option.disable_block_shaping: | ||
| return False | ||
|
|
||
| return ( | ||
| self._max_num_rows_per_block() is not None | ||
| and self._buffer.num_rows() > self._max_num_rows_per_block() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if self._output_block_size_option.disable_block_shaping: check is redundant. The _max_num_rows_per_block() method already returns None when disable_block_shaping is True, which causes the expression self._max_num_rows_per_block() is not None to evaluate to False. This makes the entire return statement False, achieving the same result as the explicit if block. Removing the explicit check simplifies the code.
def _exceeded_buffer_row_limit(self) -> bool:
return (
self._max_num_rows_per_block() is not None
and self._buffer.num_rows() > self._max_num_rows_per_block()
)| def _exceeded_buffer_size_limit(self) -> bool: | ||
| if self._output_block_size_option.disable_block_shaping: | ||
| return False | ||
|
|
||
| return ( | ||
| self._max_bytes_per_block() is not None | ||
| and self._buffer.get_estimated_memory_usage() > self._max_bytes_per_block() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to _exceeded_buffer_row_limit, the if self._output_block_size_option.disable_block_shaping: check here is redundant. The _max_bytes_per_block() method returns None when disable_block_shaping is True, which will cause this method to correctly return False. Removing the explicit check will make the code more concise.
def _exceeded_buffer_size_limit(self) -> bool:
return (
self._max_bytes_per_block() is not None
and self._buffer.get_estimated_memory_usage() > self._max_bytes_per_block()
)…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…ject#58757) ## Description In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer. Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Description
[Data] Add disable Block Shaping option to BlockOutputBuffer
In addition to Block shaping by Block Size and Num Rows, add an option to skip Block Shaping altogether in BlockOutputBuffer.
Related issues
Additional information