-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix count(*) and count(col(*))
#39622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think we should fix it in Scala side, but didn't find the right place. |
count(*) and count(expr(*))count(*), count(col(*)), count(expr(*))
|
I checked that #39636 can resolve the |
count(*), count(col(*)), count(expr(*))count(*) and count(col(*))
41cc5f7 to
fa99f10
Compare
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI seems to complain at compilation failure. Could you double-check and re-trigger, @zhengruifeng ?
Error: SparkConnectPlannerSuite.scala:555:
value addTarget is not a member of
org.apache.spark.connect.proto.Expression.UnresolvedStar.Builder
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
merged into master, thank you @cloud-fan @HyukjinKwon @dongjoon-hyun ! |
What changes were proposed in this pull request?
1, add
UnresolvedStartoexpressions.py;2, Fix
count(*)andcount(col(*)), should returnColumn(UnresolvedStar(None))instead ofColumn(UnresolvedAttribute("*")), see:spark/sql/core/src/main/scala/org/apache/spark/sql/Column.scala
Lines 144 to 150 in 68531ad
3, remove the
count(*) -> count(1)transformation ingroup.py, since it's no longer needed.Why are the changes needed?
#39636 fixed the
count(*)issue in the server side, and thencount(expr(*))works after that PR.This PR makes the corresponding changes in the Python Client side, in order to support
count(*), andcount(col(*))Does this PR introduce any user-facing change?
yes
How was this patch tested?
enabled UT and added UT