Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jan 18, 2023

What changes were proposed in this pull request?

Make Analyzer transform Count(*) into Count(1)

Why are the changes needed?

Existing Count(*) -> Count(1) transformation happens in AstBuilder.visitFunctionCall.

The Analyzer requires the Count(*) had already been converted to Count(1) in Parser, and for a given Count(*) expression, the Analyzer itself can not correctly handle it and cause correctness issue in Spark Connect (see https://issues.apache.org/jira/browse/SPARK-41845)

Does this PR introduce any user-facing change?

No

How was this patch tested?

added UT, manually test with Spark Connect

@github-actions github-actions bot added the SQL label Jan 18, 2023
@zhengruifeng zhengruifeng changed the title [WIP][SQL] Move conversion COUNT(*) -> COUNT(1) to Analyzer [WIP][SQL] Move COUNT(*) -> COUNT(1) to Analyzer Jan 18, 2023
@cloud-fan
Copy link
Contributor

@zhengruifeng can you fill the PR description and remove WIP?

@zhengruifeng zhengruifeng changed the title [WIP][SQL] Move COUNT(*) -> COUNT(1) to Analyzer [SPARK-42108][SQL] Make Analyzer transform Count(*) into Count(1) Jan 18, 2023
@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Jan 18, 2023

@cloud-fan I found that removing the count(*) -> count(1) in Parser will cause a lot of test failure (see https://github.com/zhengruifeng/spark/actions/runs/3945133167/jobs/6751752280), so add it back, and only add the support of count(*) -> count(1) in Analyzer

@zhengruifeng zhengruifeng marked this pull request as ready for review January 19, 2023 01:03
@zhengruifeng
Copy link
Contributor Author

all tests passed, minding taking another look? @cloud-fan @HyukjinKwon

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

cc @huaxingao , @sunchao , too

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 53c99a4 Jan 19, 2023
@zhengruifeng zhengruifeng deleted the sql_move_count_star branch January 20, 2023 00:45
@zhengruifeng
Copy link
Contributor Author

thank you @dongjoon-hyun @cloud-fan @HyukjinKwon

zhengruifeng added a commit that referenced this pull request Jan 21, 2023
…(col(*))`

### What changes were proposed in this pull request?
1, add `UnresolvedStar` to `expressions.py`;
2, Fix `count(*)` and `count(col(*))`, should return `Column(UnresolvedStar(None))` instead of `Column(UnresolvedAttribute("*"))`, see: https://github.com/apache/spark/blob/68531ada34db72d352c39396f85458a8370af812/sql/core/src/main/scala/org/apache/spark/sql/Column.scala#L144-L150
3, remove the `count(*) -> count(1)` transformation in `group.py`, since it's no longer needed.

### Why are the changes needed?

#39636 fixed the `count(*)` issue in the server side, and then `count(expr(*))` works after that PR.

This PR makes the corresponding changes in the Python Client side, in order to support `count(*)`, and `count(col(*))`

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
enabled UT and added UT

Closes #39622 from zhengruifeng/connect_count_star.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants