-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[data] Combine repartitions #59145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Combine repartitions #59145
Conversation
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new logical optimization rule, CombineRepartitions, which effectively merges consecutive Repartition or StreamingRepartition operators into a single one. This is a valuable optimization that can simplify the logical plan and potentially improve performance by reducing overhead from redundant repartitioning steps. The logic correctly handles the shuffle parameter for Repartition and target_num_rows_per_block for StreamingRepartition when fusing operators. The integration into the _LOGICAL_RULESET is also correctly done.
| if transformed_dag is original_dag: | ||
| return plan | ||
|
|
||
| # TODO replace w/ Plan.copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment here indicates a known area for improvement. Implementing a Plan.copy method would provide a more robust and idiomatic way to create a new LogicalPlan instance after transformations, ensuring all relevant attributes are correctly propagated and maintaining the immutability pattern consistently across the plan hierarchy. This is a good future enhancement.
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Description
A Logical Optimizer rule to combine consecutive repartitions. If 2 repartitions are consecutive, take the last one. If one of them performs an all gather shuffle, then the fused one should also shuffle too.
Related issues
None
Additional information
None