-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Data] Fix renamed columns to be appropriately dropped from the ouput #58040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors and fixes projection pushdown logic, specifically how renamed columns are handled during the fusion of Project operators. The changes introduce a distinction between alias and rename operations, with rename now correctly signaling that the source column should be dropped. The logic for fusing consecutive projections has been significantly simplified and corrected, especially for composition cases involving star() expressions. The _ColumnRewriter has been replaced with a much simpler _ColumnRefRebindingVisitor, improving maintainability. Overall, these are excellent changes that make the projection fusion logic more robust and easier to understand. I've found one issue in a new test case where an assertion could be unreliable.
| assert select_op.exprs == [ | ||
| # TODO fix (renaming doesn't remove prev columns) | ||
| col("sepal.length").alias("length"), | ||
| col("petal.width").alias("width"), | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion select_op.exprs == [...] is unreliable for comparing lists of expression objects. The __eq__ method on Expr is overloaded to build new expressions (e.g., col('a') == 1 creates a filter expression), not to check for equality. This means the assertion doesn't check for structural equality and might pass incorrectly or fail unexpectedly.
To ensure the test is robust, you should explicitly use structurally_equals for each expression in the list.
| assert select_op.exprs == [ | |
| # TODO fix (renaming doesn't remove prev columns) | |
| col("sepal.length").alias("length"), | |
| col("petal.width").alias("width"), | |
| ] | |
| expected_exprs = [ | |
| col("sepal.length").alias("length"), | |
| col("petal.width").alias("width"), | |
| ] | |
| assert len(select_op.exprs) == len(expected_exprs) | |
| assert select_op.exprs[0].structurally_equals(expected_exprs[0]) | |
| assert select_op.exprs[1].structurally_equals(expected_exprs[1]) |
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
… them Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Added ample commentary Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Typo Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
5e93af6 to
bea4aae
Compare
| # Upstream output column refs inside downstream expressions need to be bound | ||
| # to upstream output column definitions to satisfy invariant #1 (common for both | ||
| # composition/projection cases) | ||
| v = _ColumnRefRebindingVisitor(upstream_column_defs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: _ColumnRefSubstitutionVisitor is a little clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will defer to follow-up tho
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
|
|
||
| @PublicAPI(stability="beta") | ||
| # TODO remove | ||
| @DeveloperAPI(stability="alpha") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait this should be a public api.
…#58040) ## Description This change addresses the issues that currently upon column renaming we're not removing original columns. ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…#58040) (#58071) ## Description Cherry-pick of #58040 ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- > Thank you for contributing to Ray! 🚀 > Please review the [Ray Contribution Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html) before opening a pull request. >⚠️ Remove these instructions before submitting your PR. > 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete. ## Description > Briefly describe what this PR accomplishes and why it's needed. ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…ray-project#58040) ## Description This change addresses the issues that currently upon column renaming we're not removing original columns. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
…ray-project#58040) ## Description This change addresses the issues that currently upon column renaming we're not removing original columns. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…ray-project#58040) ## Description This change addresses the issues that currently upon column renaming we're not removing original columns. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…ray-project#58040) ## Description This change addresses the issues that currently upon column renaming we're not removing original columns. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
…ray-project#58040) (ray-project#58071) ## Description Cherry-pick of ray-project#58040 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- > Thank you for contributing to Ray! 🚀 > Please review the [Ray Contribution Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html) before opening a pull request. >⚠️ Remove these instructions before submitting your PR. > 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete. ## Description > Briefly describe what this PR accomplishes and why it's needed. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…ray-project#58040) ## Description This change addresses the issues that currently upon column renaming we're not removing original columns. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Description
This change addresses the issues that currently upon column renaming we're not removing original columns.
Related issues
Additional information