Skip to content

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale commented Oct 17, 2025

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

Description

  1. Add visitors for collecting column names from all expressions and renaming names across the tree.
  2. Use expressions for rename_columns, with_column, select_columns and remove cols and cols_rename in Project
  3. Modify Projection Pushdown to work with combinations of the above operators correctly

Related issues

Closes #56878, #57700

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner October 17, 2025 19:39
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and positive refactoring that updates the Project operator to use a more general and powerful expression-based API. This simplifies the logical plan and enables more robust optimizations, as shown by the rewrite of the projection pushdown rule. The addition of comprehensive tests for the new expression logic is also a great improvement. I've found one critical issue with the count() implementation and a couple of medium-severity suggestions for performance and code style.

cursor[bot]

This comment was marked as outdated.

@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Oct 17, 2025
cursor[bot]

This comment was marked as outdated.

@goutamvenkat-anyscale goutamvenkat-anyscale force-pushed the goutam/expr_project_2_n branch 2 times, most recently from 2b6da29 to d9ef6e1 Compare October 17, 2025 22:13
cursor[bot]

This comment was marked as outdated.

@goutamvenkat-anyscale goutamvenkat-anyscale force-pushed the goutam/expr_project_2_n branch 3 times, most recently from 1d995c9 to 4f9a2cc Compare October 21, 2025 00:35
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 22, 2025 05:00
Signed-off-by: Goutam <goutam@anyscale.com>
@github-actions github-actions bot disabled auto-merge October 22, 2025 05:50
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 22, 2025 06:10
@alexeykudinkin alexeykudinkin merged commit 27a6994 into ray-project:master Oct 22, 2025
7 checks passed
Vito-Yang added a commit to Vito-Yang/ray that referenced this pull request Oct 22, 2025
[Data] [2/n] - Update Project operator to use Expressions (ray-project#57855)
@goutamvenkat-anyscale goutamvenkat-anyscale deleted the goutam/expr_project_2_n branch October 23, 2025 17:50
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…t#57855)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.


## Description
1. Add visitors for collecting column names from all expressions and
renaming names across the tree.
2. Use expressions for rename_columns, with_column, select_columns and
remove cols and cols_rename in Project
3. Modify Projection Pushdown to work with combinations of the above
operators correctly

## Related issues
Closes ray-project#56878,
ray-project#57700

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…t#57855)

> Thank you for contributing to Ray! 🚀
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

## Description
1. Add visitors for collecting column names from all expressions and
renaming names across the tree.
2. Use expressions for rename_columns, with_column, select_columns and
remove cols and cols_rename in Project
3. Modify Projection Pushdown to work with combinations of the above
operators correctly

## Related issues
Closes ray-project#56878,
ray-project#57700

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ray Data] Add a force overwrite option to the rename_columns()

2 participants