-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Fixing handling of renames in projection pushdown #58033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant improvement to projection pushdown by enabling column renames to be pushed into the read operations. The changes are well-structured, introducing a new utility for collapsing transitive rename maps and updating the logical operators and datasources accordingly. The core logic in the ProjectionPushdown rule correctly distinguishes between simple projections that can be fully pushed down and complex ones that require keeping the Project operator. I've identified a few areas for improvement, including removing leftover debug statements, adding a missing type hint for better maintainability, and minor code refinements for clarity and robustness.
|
|
||
| # Follow the chain until we reach a key that's not in the mapping | ||
| while cur in d: | ||
| next = d[cur] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _combine_rename_map( | ||
| prev_column_rename_map: Optional[Dict[str, str]], | ||
| new_column_rename_map: Optional[Dict[str, str]], | ||
| ): | ||
| if not prev_column_rename_map: | ||
| combined = new_column_rename_map | ||
| elif not new_column_rename_map: | ||
| combined = prev_column_rename_map | ||
| else: | ||
| combined = prev_column_rename_map | new_column_rename_map | ||
|
|
||
| return collapse_transitive_map(combined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is missing a return type hint. Adding it would improve type safety and code clarity. Additionally, the logic for combining the dictionaries can be made more concise and robust against None values by using or {}.
def _combine_rename_map(
prev_column_rename_map: Optional[Dict[str, str]],
new_column_rename_map: Optional[Dict[str, str]],
) -> Dict[str, str]:
combined = (prev_column_rename_map or {}) | (new_column_rename_map or {})
return collapse_transitive_map(combined)Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
srinathk10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
79cad0e to
8070405
Compare
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
## Description This change properly handles of pushing of the renaming projections into read ops (that support projections, like parquet reads). ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…8037) ## Description Cherry-pick of #58033 ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- > Thank you for contributing to Ray! 🚀 > Please review the [Ray Contribution Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html) before opening a pull request. >⚠️ Remove these instructions before submitting your PR. > 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete. ## Description > Briefly describe what this PR accomplishes and why it's needed. ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…#58033) ## Description This change properly handles of pushing of the renaming projections into read ops (that support projections, like parquet reads). ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
…#58033) ## Description This change properly handles of pushing of the renaming projections into read ops (that support projections, like parquet reads). ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…#58033) ## Description This change properly handles of pushing of the renaming projections into read ops (that support projections, like parquet reads). ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Description
This change properly handles of pushing of the renaming projections into read ops (that support projections, like parquet reads).
Related issues
Additional information