-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG #1211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, test this please. |
|
Yup, that's definitely doable. I'll work on adding that DSL support and add it to this pull request. |
|
I thought about this more ('key.avg vs avg('key)), and IMHO the latter (avg('key)) is more natural. It matches nicely with what users write in SQL already. It also delivers the subtle message that 'key here is a cell (more precisely, the key attribute in each row) rather than a whole column. |
|
I have added DSL support ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about sumDistinct(e: Expression*)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You version is closer to SQL I guess, but I'm wary of too much implicit magic (there is already a lot!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no implicitness going on in here, since the user needs to explicitly call both sum and disctinct. I have no problem changing it, though.
Fixed.
|
All feedback has been addressed |
|
Hi Ximo, Sorry for the delay. Many of the committers are busy running the Spark On Wed, Jul 2, 2014 at 3:06 AM, Ximo Guanter notifications@github.com
|
|
No problem. Thanks for the update! |
… and AVG **Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. **Testing** Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG (cherry picked from commit 5c6ec94) Signed-off-by: Michael Armbrust <michael@databricks.com>
… and AVG **Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. **Testing** Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG (cherry picked from commit 5c6ec94) Signed-off-by: Michael Armbrust <michael@databricks.com>
|
Thanks! I've merged this into master and the 1.0 branches. |
… and AVG **Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. **Testing** Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes apache#1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG
|
QA tests have started for PR 1211 at commit
|
|
QA tests have finished for PR 1211 at commit
|
Description This patch enables using the
.select()function in SchemaRDD with functions such asSum,Countand other.Testing Unit tests added.