[SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count #22728

cloud-fan · 2018-10-15T15:38:12Z

What changes were proposed in this pull request?

AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell.

Since Spark supports it, we should clearly document the current behavior and add tests to verify it.

How was this patch tested?

N/A

cloud-fan · 2018-10-15T15:39:00Z

cc @gatorsmile @mgaido91 @viirya

mgaido91 · 2018-10-15T15:49:16Z

this is indeed the behavior I'd expect. Good to add tests to enforce the behavior. Did you check other RDBMs apart from Postgres?

cloud-fan · 2018-10-15T15:52:32Z

I'm going to try Hive and Presto, but my local environment has some problems and I need to fix it first. Will work on it tomorrow.

cloud-fan · 2018-10-15T15:58:38Z

BTW MySQL doesn't support count(a, b) but supports count(distinct a, b), the result is same as Spark.

viirya · 2018-10-15T16:04:32Z

Yea, it is definitely good to add document and test for current behavior.

SparkQA · 2018-10-15T19:21:30Z

Test build #97401 has finished for PR 22728 at commit 708d7fd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-15T19:28:26Z

Test build #97400 has finished for PR 22728 at commit 62b4b84.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-15T20:05:56Z

sql/core/src/test/resources/sql-tests/inputs/count.sql

+FROM testData;
+
+-- count with multiple expressions
+SELECT count(a, b), count(b, a), count(testData.*) FROM testData;


Please also include count(*)

gatorsmile · 2018-10-15T20:07:32Z

sql/core/src/test/resources/sql-tests/inputs/count.sql

+SELECT count(a, b), count(b, a), count(testData.*) FROM testData;
+
+-- distinct count with multiple expressions
+SELECT count(DISTINCT a, b), count(DISTINCT b, a), count(DISTINCT testData.*) FROM testData;


Also include count(DISTINCT *)

gatorsmile · 2018-10-15T20:10:29Z

Let us add one more case in the test suite.

SELECT count(1), count(NULL) FROM testData;

gatorsmile · 2018-10-15T20:10:49Z

LGTM except the above comments.

viirya · 2018-10-16T03:54:32Z

LGTM

SparkQA · 2018-10-16T05:23:48Z

Test build #97420 has finished for PR 22728 at commit e3aaa90.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-10-16T07:13:33Z

Merged to master and branch-2.4.

…lumn count ## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell. Since Spark supports it, we should clearly document the current behavior and add tests to verify it. ## How was this patch tested? N/A Closes #22728 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org> (cherry picked from commit e028fd3) Signed-off-by: hyukjinkwon <gurwls223@apache.org>

cloud-fan · 2018-10-16T12:21:32Z

FYI, I tried both hive and presto, neither of them supports multi-column count.

mgaido91 · 2018-10-16T12:25:56Z

thanks for your work @cloud-fan !

HyukjinKwon · 2018-10-25T05:28:18Z

(From #22773 (comment)) @gatorsmile and @cloud-fan, let's say this will break DESCRIBE FUNCTION EXTENDED. Should we update migration guide as well?

…lumn count ## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream databases(postgres doesn't support), and the SQL standard doesn't define it clearly, as near as I can tell. Since Spark supports it, we should clearly document the current behavior and add tests to verify it. ## How was this patch tested? N/A Closes apache#22728 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>

add tests to verify the behavior of count for corner cases

708d7fd

cloud-fan force-pushed the doc branch from 62b4b84 to 708d7fd Compare October 15, 2018 15:43

gatorsmile reviewed Oct 15, 2018

View reviewed changes

address comments

e3aaa90

HyukjinKwon approved these changes Oct 16, 2018

View reviewed changes

asfgit closed this in e028fd3 Oct 16, 2018

[SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count #22728

[SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count #22728

Uh oh!

Conversation

cloud-fan commented Oct 15, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Oct 15, 2018

Uh oh!

mgaido91 commented Oct 15, 2018

Uh oh!

cloud-fan commented Oct 15, 2018

Uh oh!

cloud-fan commented Oct 15, 2018

Uh oh!

viirya commented Oct 15, 2018

Uh oh!

SparkQA commented Oct 15, 2018

Uh oh!

SparkQA commented Oct 15, 2018

Uh oh!

gatorsmile Oct 15, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 15, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Oct 15, 2018

Uh oh!

gatorsmile commented Oct 15, 2018

Uh oh!

viirya commented Oct 16, 2018

Uh oh!

SparkQA commented Oct 16, 2018

Uh oh!

HyukjinKwon commented Oct 16, 2018

Uh oh!

cloud-fan commented Oct 16, 2018

Uh oh!

mgaido91 commented Oct 16, 2018

Uh oh!

HyukjinKwon commented Oct 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants