Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite approx_median to approx_percentile_cont while planning phase #2262

Merged
merged 2 commits into from
Apr 28, 2022

Conversation

korowa
Copy link
Contributor

@korowa korowa commented Apr 18, 2022

Which issue does this PR close?

Closes #2221 .

Rationale for this change

At this moment optimization rule for "approx_median -> approx_percentile_cont" replacement slightly breaks logical plan (expressions inside of aggregate step don't match its output schema) and it works fine in case of one optimizer pass, but while second optimizer pass projection_push_down rule cleans up approximate_percentile_cont.

What changes are included in this PR?

Suggestion is to move function replacement to planning phase - it seems to be more appropriate (we don't actually need the whole execution plan for this, because it's just rewriting of single expression), and there is no need to adjust all aliases / projections / schemas while optimization phase after replacement.

Are there any user-facing changes?

Fixed execution of queries with approx_median aggregate expression / window function / having filter.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 18, 2022
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @korowa. I have failing tests in #2369 which are fixed by this PR so LGTM.

@andygrove
Copy link
Member

@realno @yahoNanJing fyi since you have both worked on this code. I plan to merge this soon if there are no objections.

@andygrove andygrove merged commit 7b61d52 into apache:master Apr 28, 2022
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jul 5, 2022
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Sep 1, 2022
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregate func Approx_median not work with Parquet format
2 participants