-
Notifications
You must be signed in to change notification settings - Fork 7k
[data] HashShuffleAggregator break down block on finalize #58603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] HashShuffleAggregator break down block on finalize #58603
Conversation
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly identifies and aims to solve an important issue with HashShuffleAggregator handling very large blocks, which can lead to out-of-memory errors. The approach of using BlockOutputBuffer to break down large blocks is sound. However, the current implementation of the finalize method introduces several critical issues, including a risk of deadlocks, potential data loss, and incorrect metrics reporting. My review provides a detailed comment with a suggested replacement for the finalize method that addresses these problems while preserving the original intent of the change.
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…/aggregator-yield-block-size
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
26404f8 to
ec0e610
Compare
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
| if partition_id in self._finalizing_tasks: | ||
| self._finalizing_tasks.pop(partition_id) | ||
|
|
||
| # Update Finalize Metrics on task completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
|
who is reviewing this pr? |
| # so we do not break the block down further. | ||
| if target_max_block_size is not None: | ||
| # Creating a block output buffer per partition finalize task because: | ||
| # 1. Need to keep track of which tasks have already been finalized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't understand what (1) means. Could you elaborate/revise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, I don't think 1 makes sense too lol. My intent was that I can keep track of re-finalizing tasks, but that would lead to additional stats + maybe additional locks = more complexity, so kept it simple
|
|
||
| def finalize( | ||
| self, partition_id: int | ||
| ) -> AsyncGenerator[Union[Block, "BlockMetadataWithSchema"], None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Out-of-scope for this PR, but I think this is a regular generator, not async
| ) -> AsyncGenerator[Union[Block, "BlockMetadataWithSchema"], None]: | |
| ) -> Generator[Union[Block, "BlockMetadataWithSchema"], None]: |
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Description
HashShuffleAggregatorcurrently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic because then block being returned will spill to disk. Consider the following scenario:Why this is better
streaming_genbackpressure to avoid materializing the entire object.StreamingRepartition, but that is more work for the user.This PR addresses this by using
OutputBlockBufferto reshape the blocks back todata_context.target_max_block_size.Related issues
None
Additional information
Encountered this personally with 180GiB block, which would OOD