-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count #22728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
this is indeed the behavior I'd expect. Good to add tests to enforce the behavior. Did you check other RDBMs apart from Postgres? |
|
I'm going to try Hive and Presto, but my local environment has some problems and I need to fix it first. Will work on it tomorrow. |
|
BTW MySQL doesn't support |
|
Yea, it is definitely good to add document and test for current behavior. |
|
Test build #97401 has finished for PR 22728 at commit
|
|
Test build #97400 has finished for PR 22728 at commit
|
| FROM testData; | ||
|
|
||
| -- count with multiple expressions | ||
| SELECT count(a, b), count(b, a), count(testData.*) FROM testData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also include count(*)
| SELECT count(a, b), count(b, a), count(testData.*) FROM testData; | ||
|
|
||
| -- distinct count with multiple expressions | ||
| SELECT count(DISTINCT a, b), count(DISTINCT b, a), count(DISTINCT testData.*) FROM testData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also include count(DISTINCT *)
|
Let us add one more case in the test suite. SELECT count(1), count(NULL) FROM testData; |
|
LGTM except the above comments. |
|
LGTM |
|
Test build #97420 has finished for PR 22728 at commit
|
|
Merged to master and branch-2.4. |
…lumn count ## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell. Since Spark supports it, we should clearly document the current behavior and add tests to verify it. ## How was this patch tested? N/A Closes #22728 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org> (cherry picked from commit e028fd3) Signed-off-by: hyukjinkwon <gurwls223@apache.org>
|
FYI, I tried both hive and presto, neither of them supports multi-column count. |
|
thanks for your work @cloud-fan ! |
|
(From #22773 (comment)) @gatorsmile and @cloud-fan, let's say this will break |
…lumn count ## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell. Since Spark supports it, we should clearly document the current behavior and add tests to verify it. ## How was this patch tested? N/A Closes apache#22728 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
What changes were proposed in this pull request?
AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell.
Since Spark supports it, we should clearly document the current behavior and add tests to verify it.
How was this patch tested?
N/A