Skip to content

Conversation

@edrevo
Copy link
Contributor

@edrevo edrevo commented Jun 25, 2014

Description This patch enables using the .select() function in SchemaRDD with functions such as Sum, Count and other.
Testing Unit tests added.

@concretevitamin
Copy link
Contributor

Jenkins, test this please.

@concretevitamin
Copy link
Contributor

Hi @edrevo -- thanks, the implementation looks really concise! I haven't thought about this, but what do people think about instead of having syntax like AVG(key), we support something like 'key.avg? One example in the current code base is asc (the sort order). Also +@marmbrus.

@edrevo
Copy link
Contributor Author

edrevo commented Jun 25, 2014

Yup, that's definitely doable. I'll work on adding that DSL support and add it to this pull request.

@rxin
Copy link
Contributor

rxin commented Jun 25, 2014

I thought about this more ('key.avg vs avg('key)), and IMHO the latter (avg('key)) is more natural. It matches nicely with what users write in SQL already. It also delivers the subtle message that 'key here is a cell (more precisely, the key attribute in each row) rather than a whole column.

@edrevo
Copy link
Contributor Author

edrevo commented Jun 26, 2014

I have added DSL support (avg(), count(distinct())...) following @rxin's suggestion. I'm happy to change the DSL style if you end up favoring the other approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about sumDistinct(e: Expression*)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You version is closer to SQL I guess, but I'm wary of too much implicit magic (there is already a lot!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no implicitness going on in here, since the user needs to explicitly call both sum and disctinct. I have no problem changing it, though.

Fixed.

@edrevo
Copy link
Contributor Author

edrevo commented Jul 2, 2014

All feedback has been addressed

@concretevitamin
Copy link
Contributor

Hi Ximo,

Sorry for the delay. Many of the committers are busy running the Spark
Summit these few days so a lot of PRs are backlog'd. They will probably be
addressed later this week / early next week.

On Wed, Jul 2, 2014 at 3:06 AM, Ximo Guanter notifications@github.com
wrote:

All feedback has been addressed


Reply to this email directly or view it on GitHub
#1211 (comment).

@edrevo
Copy link
Contributor Author

edrevo commented Jul 2, 2014

No problem. Thanks for the update!

asfgit pushed a commit that referenced this pull request Jul 2, 2014
… and AVG

**Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other.
**Testing** Unit tests added.

Author: Ximo Guanter Gonzalbez <ximo@tid.es>

Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits:

fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions
e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG

(cherry picked from commit 5c6ec94)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in 5c6ec94 Jul 2, 2014
asfgit pushed a commit that referenced this pull request Jul 2, 2014
… and AVG

**Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other.
**Testing** Unit tests added.

Author: Ximo Guanter Gonzalbez <ximo@tid.es>

Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits:

fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions
e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG

(cherry picked from commit 5c6ec94)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@marmbrus
Copy link
Contributor

marmbrus commented Jul 2, 2014

Thanks! I've merged this into master and the 1.0 branches.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
… and AVG

**Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other.
**Testing** Unit tests added.

Author: Ximo Guanter Gonzalbez <ximo@tid.es>

Closes apache#1211 from edrevo/add-expression-support-in-select and squashes the following commits:

fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions
e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG
@SparkQA
Copy link

SparkQA commented Oct 2, 2014

QA tests have started for PR 1211 at commit fe4a1e1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 2, 2014

QA tests have finished for PR 1211 at commit fe4a1e1.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@edrevo edrevo deleted the add-expression-support-in-select branch September 26, 2019 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants