-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42108][SQL] Make Analyzer transform Count(*) into Count(1)
#39636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
COUNT(*) -> COUNT(1) to AnalyzerCOUNT(*) -> COUNT(1) to Analyzer
|
@zhengruifeng can you fill the PR description and remove WIP? |
84b367b to
90088c7
Compare
COUNT(*) -> COUNT(1) to AnalyzerCount(*) into Count(1)
|
@cloud-fan I found that removing the |
|
all tests passed, minding taking another look? @cloud-fan @HyukjinKwon |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
cc @huaxingao , @sunchao , too
|
thanks, merging to master! |
|
thank you @dongjoon-hyun @cloud-fan @HyukjinKwon |
…(col(*))`
### What changes were proposed in this pull request?
1, add `UnresolvedStar` to `expressions.py`;
2, Fix `count(*)` and `count(col(*))`, should return `Column(UnresolvedStar(None))` instead of `Column(UnresolvedAttribute("*"))`, see: https://github.com/apache/spark/blob/68531ada34db72d352c39396f85458a8370af812/sql/core/src/main/scala/org/apache/spark/sql/Column.scala#L144-L150
3, remove the `count(*) -> count(1)` transformation in `group.py`, since it's no longer needed.
### Why are the changes needed?
#39636 fixed the `count(*)` issue in the server side, and then `count(expr(*))` works after that PR.
This PR makes the corresponding changes in the Python Client side, in order to support `count(*)`, and `count(col(*))`
### Does this PR introduce _any_ user-facing change?
yes
### How was this patch tested?
enabled UT and added UT
Closes #39622 from zhengruifeng/connect_count_star.
Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
Make Analyzer transform
Count(*)intoCount(1)Why are the changes needed?
Existing
Count(*) -> Count(1)transformation happens inAstBuilder.visitFunctionCall.The Analyzer requires the
Count(*)had already been converted toCount(1)in Parser, and for a givenCount(*)expression, the Analyzer itself can not correctly handle it and cause correctness issue in Spark Connect (see https://issues.apache.org/jira/browse/SPARK-41845)Does this PR introduce any user-facing change?
No
How was this patch tested?
added UT, manually test with Spark Connect