Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Oct 23, 2025

Description

This change addresses the issues that currently upon column renaming we're not removing original columns.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested a review from a team as a code owner October 23, 2025 09:12
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Oct 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and fixes projection pushdown logic, specifically how renamed columns are handled during the fusion of Project operators. The changes introduce a distinction between alias and rename operations, with rename now correctly signaling that the source column should be dropped. The logic for fusing consecutive projections has been significantly simplified and corrected, especially for composition cases involving star() expressions. The _ColumnRewriter has been replaced with a much simpler _ColumnRefRebindingVisitor, improving maintainability. Overall, these are excellent changes that make the projection fusion logic more robust and easier to understand. I've found one issue in a new test case where an assertion could be unreliable.

Comment on lines +1378 to +1382
assert select_op.exprs == [
# TODO fix (renaming doesn't remove prev columns)
col("sepal.length").alias("length"),
col("petal.width").alias("width"),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The assertion select_op.exprs == [...] is unreliable for comparing lists of expression objects. The __eq__ method on Expr is overloaded to build new expressions (e.g., col('a') == 1 creates a filter expression), not to check for equality. This means the assertion doesn't check for structural equality and might pass incorrectly or fail unexpectedly.

To ensure the test is robust, you should explicitly use structurally_equals for each expression in the list.

Suggested change
assert select_op.exprs == [
# TODO fix (renaming doesn't remove prev columns)
col("sepal.length").alias("length"),
col("petal.width").alias("width"),
]
expected_exprs = [
col("sepal.length").alias("length"),
col("petal.width").alias("width"),
]
assert len(select_op.exprs) == len(expected_exprs)
assert select_op.exprs[0].structurally_equals(expected_exprs[0])
assert select_op.exprs[1].structurally_equals(expected_exprs[1])

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Projection Fusion Fails on Renamed Columns

The _get_col_refs_removed_by_renaming function incorrectly marks source columns as "removed" for any rename operation, even if the column is still present in the projection's output. This can lead to "column not found" errors during projection fusion.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Self-Rename Bug in Column Projection

The _extract_simple_rename function incorrectly marks columns as removed during self-renames (e.g., col("a").rename("a")). This occurs because the source_name != target_name check was removed, causing projection fusion to incorrectly report missing columns.

Fix in Cursor Fix in Web

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 23, 2025
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
… them

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Added ample commentary

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Typo

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested a review from a team as a code owner October 23, 2025 22:18
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

# Upstream output column refs inside downstream expressions need to be bound
# to upstream output column definitions to satisfy invariant #1 (common for both
# composition/projection cases)
v = _ColumnRefRebindingVisitor(upstream_column_defs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: _ColumnRefSubstitutionVisitor is a little clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will defer to follow-up tho

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>

@PublicAPI(stability="beta")
# TODO remove
@DeveloperAPI(stability="alpha")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait this should be a public api.

@alexeykudinkin alexeykudinkin merged commit c60e6c1 into master Oct 24, 2025
6 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/prj-pdwn-fix-2 branch October 24, 2025 00:58
alexeykudinkin added a commit that referenced this pull request Oct 24, 2025
…#58040)

## Description

This change addresses the issues that currently upon column renaming
we're not removing original columns.

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
aslonnie pushed a commit that referenced this pull request Oct 24, 2025
…#58040) (#58071)

## Description

Cherry-pick of #58040

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
…ray-project#58040)

## Description

This change addresses the issues that currently upon column renaming
we're not removing original columns.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ray-project#58040)

## Description

This change addresses the issues that currently upon column renaming
we're not removing original columns.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ray-project#58040)

## Description

This change addresses the issues that currently upon column renaming
we're not removing original columns.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ray-project#58040)

## Description

This change addresses the issues that currently upon column renaming
we're not removing original columns.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
weiquanlee pushed a commit to antgroup/ant-ray that referenced this pull request Dec 11, 2025
…ray-project#58040) (ray-project#58071)

## Description

Cherry-pick of ray-project#58040

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> 💡 Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Blaze-DSP pushed a commit to Blaze-DSP/ray that referenced this pull request Dec 18, 2025
…ray-project#58040)

## Description

This change addresses the issues that currently upon column renaming
we're not removing original columns.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants