Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Oct 23, 2025

Description

This change properly handles of pushing of the renaming projections into read ops (that support projections, like parquet reads).

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested review from a team as code owners October 23, 2025 04:14
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Oct 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement to projection pushdown by enabling column renames to be pushed into the read operations. The changes are well-structured, introducing a new utility for collapsing transitive rename maps and updating the logical operators and datasources accordingly. The core logic in the ProjectionPushdown rule correctly distinguishes between simple projections that can be fully pushed down and complex ones that require keeping the Project operator. I've identified a few areas for improvement, including removing leftover debug statements, adding a missing type hint for better maintainability, and minor code refinements for clarity and robustness.


# Follow the chain until we reach a key that's not in the mapping
while cur in d:
next = d[cur]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable next shadows the built-in function next(). It's a good practice to avoid shadowing built-ins to prevent potential confusion and bugs. Consider renaming it to something more specific like next_key or next_val.

Suggested change
next = d[cur]
next_val = d[cur]

Comment on lines +876 to +887
def _combine_rename_map(
prev_column_rename_map: Optional[Dict[str, str]],
new_column_rename_map: Optional[Dict[str, str]],
):
if not prev_column_rename_map:
combined = new_column_rename_map
elif not new_column_rename_map:
combined = prev_column_rename_map
else:
combined = prev_column_rename_map | new_column_rename_map

return collapse_transitive_map(combined)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is missing a return type hint. Adding it would improve type safety and code clarity. Additionally, the logic for combining the dictionaries can be made more concise and robust against None values by using or {}.

def _combine_rename_map(
    prev_column_rename_map: Optional[Dict[str, str]],
    new_column_rename_map: Optional[Dict[str, str]],
) -> Dict[str, str]:
    combined = (prev_column_rename_map or {}) | (new_column_rename_map or {})
    return collapse_transitive_map(combined)

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin changed the title [Data] Fixing projection pushdown [Data] Fixing handling of renames in projection pushdown Oct 23, 2025
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@srinathk10 srinathk10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 23, 2025 06:21
@github-actions github-actions bot disabled auto-merge October 23, 2025 06:21
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 23, 2025 06:22
@alexeykudinkin alexeykudinkin merged commit 4130e4d into master Oct 23, 2025
8 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/prj-pdwn-fix branch October 23, 2025 06:57
alexeykudinkin added a commit that referenced this pull request Oct 23, 2025
## Description

This change properly handles of pushing of the renaming projections into
read ops (that support projections, like parquet reads).

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
aslonnie pushed a commit that referenced this pull request Oct 23, 2025
…8037)

## Description

Cherry-pick of #58033

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
…#58033)

## Description

This change properly handles of pushing of the renaming projections into
read ops (that support projections, like parquet reads).

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…#58033)

## Description

This change properly handles of pushing of the renaming projections into
read ops (that support projections, like parquet reads).

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…#58033)

## Description

This change properly handles of pushing of the renaming projections into
read ops (that support projections, like parquet reads).

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants