Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

Description

Cherry-pick of #58033

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

Additional information

Optional: Add implementation details, API changes, usage examples,
screenshots, etc.


Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

## Description

This change properly handles of pushing of the renaming projections into
read ops (that support projections, like parquet reads).

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested a review from a team as a code owner October 23, 2025 07:12
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Oct 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the projection pushdown logic to correctly handle column renames, which is a great improvement. The core idea is to distinguish simple projections (selects/renames) from complex ones and push down the rename map to the data source for simple cases, avoiding an extra MapBatches operator. The changes are well-structured, introducing a collapse_transitive_map utility for chained renames and updating the ParquetDatasource and logical operators accordingly. The logic seems sound, but I have one suggestion to improve the robustness and clarity of a new helper function.

Comment on lines +876 to +887
def _combine_rename_map(
prev_column_rename_map: Optional[Dict[str, str]],
new_column_rename_map: Optional[Dict[str, str]],
):
if not prev_column_rename_map:
combined = new_column_rename_map
elif not new_column_rename_map:
combined = prev_column_rename_map
else:
combined = prev_column_rename_map | new_column_rename_map

return collapse_transitive_map(combined)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is missing a return type hint. Based on its usage, it should be -> Dict[str, str].

Additionally, the combined variable can be None if one of the input rename maps is None. While collapse_transitive_map currently handles None input by returning {}, relying on this implicit behavior can be brittle. It's safer and clearer to ensure a dictionary is always passed.

I suggest adding the type hint and making the None handling explicit by using combined or {}.

Suggested change
def _combine_rename_map(
prev_column_rename_map: Optional[Dict[str, str]],
new_column_rename_map: Optional[Dict[str, str]],
):
if not prev_column_rename_map:
combined = new_column_rename_map
elif not new_column_rename_map:
combined = prev_column_rename_map
else:
combined = prev_column_rename_map | new_column_rename_map
return collapse_transitive_map(combined)
def _combine_rename_map(
prev_column_rename_map: Optional[Dict[str, str]],
new_column_rename_map: Optional[Dict[str, str]],
) -> Dict[str, str]:
if not prev_column_rename_map:
combined = new_column_rename_map
elif not new_column_rename_map:
combined = prev_column_rename_map
else:
combined = prev_column_rename_map | new_column_rename_map
return collapse_transitive_map(combined or {})

@aslonnie aslonnie enabled auto-merge (squash) October 23, 2025 11:31
@aslonnie aslonnie disabled auto-merge October 23, 2025 11:32
@aslonnie aslonnie merged commit 0e6b21a into releases/2.51.0 Oct 23, 2025
7 checks passed
@aslonnie aslonnie deleted the ak/prj-pdwn-fix-cp branch October 23, 2025 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants