[SPARK-42468][CONNECT] Implement agg by (String, String)* #40057

amaliujia · 2023-02-16T21:35:43Z

What changes were proposed in this pull request?

Starting to support basic aggregation in Scala client. The first step is to support aggregation by strings.

Why are the changes needed?

API coverage

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

amaliujia · 2023-02-16T21:35:56Z

@hvanhovell

hvanhovell · 2023-02-16T23:36:30Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala

As soon as we merge #40050 we should just use the functions in there.

hvanhovell · 2023-02-16T23:37:40Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala

Nit: you don't have to convert to DataFrame here (as soon as we introduce encoders it might actually be a bit faster if we don't). You could also pass in the columns as is.

This is about if we want to keep the same class signature of RelationalGroupedDataset as what it is in SQL. If such class as protected/private class is not needed to match SQL ones, then it is ok to passing in more closer classes.

The constructor does not need to have the same signature since an end-user is not supposed to instantiate this thing. BTW you are already breaking the signature because we use proto.Expression instead catalyst.Expression.

Yeah I guess the major thing probably is because this is not a public API.

How about let me follow up in future PRs on what is the final class signature for RelationalGroupedDataset? There are a lot more API to add in this class.

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala

connector/connect/common/src/test/resources/query-tests/explain-results/groupby_agg.explain

hvanhovell

A couple of nits, looks pretty good!

hvanhovell

LGTM

zhengruifeng · 2023-02-17T01:35:31Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala

+          .setIsDistinct(false)
+      // Also special handle count because we need to take care count(*).
+      case "count" | "size" =>
+        // Turn count(*) into count(1)


Do we still need to take care of count(*)? we don't need it in python client #39622 (comment)

This is to match existing scala side Dataframe impl. @cloud-fan do you know if we need count(*) to count(1) conversion? If not we can both change here and existing DF.

This is to match existing scala side Dataframe impl.

LGTM, we can update them later

Yeah, we don't need it. We can address when we replace this stuff by the functions API.

### What changes were proposed in this pull request? Starting to support basic aggregation in Scala client. The first step is to support aggregation by strings. ### Why are the changes needed? API coverage ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #40057 from amaliujia/rw-agg. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Herman van Hovell <herman@databricks.com> (cherry picked from commit cc471a5) Signed-off-by: Herman van Hovell <herman@databricks.com>

### What changes were proposed in this pull request? Starting to support basic aggregation in Scala client. The first step is to support aggregation by strings. ### Why are the changes needed? API coverage ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#40057 from amaliujia/rw-agg. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Herman van Hovell <herman@databricks.com> (cherry picked from commit cc471a5) Signed-off-by: Herman van Hovell <herman@databricks.com>

github-actions bot added CONNECT SQL labels Feb 16, 2023

hvanhovell reviewed Feb 16, 2023

View reviewed changes

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala Outdated Show resolved Hide resolved

hvanhovell reviewed Feb 16, 2023

View reviewed changes

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala Outdated Show resolved Hide resolved

hvanhovell reviewed Feb 16, 2023

View reviewed changes

connector/connect/common/src/test/resources/query-tests/explain-results/groupby_agg.explain Outdated Show resolved Hide resolved

hvanhovell reviewed Feb 16, 2023

View reviewed changes

amaliujia added 2 commits February 16, 2023 16:39

[SPARK-42468][CONNECT] Implement agg by (String, String)*.

337b827

update

ffc97f1

amaliujia force-pushed the rw-agg branch from 7bb3b28 to ffc97f1 Compare February 17, 2023 00:42

amaliujia added 2 commits February 16, 2023 16:44

update

34136e6

update

5c116c2

hvanhovell approved these changes Feb 17, 2023

View reviewed changes

zhengruifeng reviewed Feb 17, 2023

View reviewed changes

hvanhovell closed this in cc471a5 Feb 17, 2023

amaliujia mentioned this pull request Feb 22, 2023

[SPARK-42468][CONNECT][FOLLOW-UP] Add .agg variants in Dataset #40125

Closed

[SPARK-42468][CONNECT] Implement agg by (String, String)* #40057

[SPARK-42468][CONNECT] Implement agg by (String, String)* #40057

Uh oh!

Conversation

amaliujia commented Feb 16, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

amaliujia commented Feb 16, 2023

Uh oh!

hvanhovell Feb 16, 2023

Choose a reason for hiding this comment

Uh oh!

hvanhovell Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaliujia Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

hvanhovell Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

amaliujia Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

amaliujia Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

hvanhovell Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hvanhovell Feb 16, 2023 •

edited

Loading

amaliujia Feb 17, 2023 •

edited

Loading