Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns #7919

Closed
wants to merge 1 commit into from

Conversation

jorgecarleitao
Copy link
Member

This PR makes the projection optimizer remove any projection or aggregation that is not used down the plan, thus improving speed and convenience.

This is worked on top of #7879 and only the last commit is specific to this PR.

@github-actions
Copy link

github-actions bot commented Aug 9, 2020

@jorgecarleitao
Copy link
Member Author

jorgecarleitao commented Aug 14, 2020

FYI @andygrove and @alamb

Again, this is useful mostly to the table API. SQL statements are parsed into an optimized plan AFAIK.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR carefully and I think the logic looks good -- I actually tried running the new test cases without the code changes which was instructive.

I had only minor suggestions.

This commit makes the projection optimizer remove any projection
or aggregation that is not used down the plan.
@jorgecarleitao
Copy link
Member Author

@andygrove , is there anything we need to work this further?

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get a chance to review this one yet because it was taking me a while to understand the new logic, but I'm happy to approve based on @alamb 's review.

@andygrove
Copy link
Member

I do have a nagging concern that the logic may not work if the query plan contains aliases that rename columns, but we can address that as a follow up if/when that becomes an issue.

@andygrove andygrove closed this in 5e7be07 Aug 22, 2020
@jorgecarleitao jorgecarleitao deleted the projection branch August 22, 2020 16:47
@jorgecarleitao
Copy link
Member Author

Thank you, @andygrove ! I encapsulated that thought on ARROW-9830, with issue type "Test". :)

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 16, 2020
…e unused columns

This PR makes the projection optimizer remove any projection or aggregation that is not used down the plan, thus improving speed and convenience.

This is worked on top of apache#7879 and only the last commit is specific to this PR.

Closes apache#7919 from jorgecarleitao/projection

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…e unused columns

This PR makes the projection optimizer remove any projection or aggregation that is not used down the plan, thus improving speed and convenience.

This is worked on top of apache#7879 and only the last commit is specific to this PR.

Closes apache#7919 from jorgecarleitao/projection

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants