Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GROUPING support to Presto #4805

Closed
raghavsethi opened this issue Mar 16, 2016 · 13 comments
Closed

Add GROUPING support to Presto #4805

raghavsethi opened this issue Mar 16, 2016 · 13 comments
Assignees

Comments

@raghavsethi
Copy link
Contributor

No description provided.

@raghavsethi raghavsethi self-assigned this Mar 16, 2016
@cpcloud
Copy link

cpcloud commented Mar 16, 2016

FWIW, the standard specifies a GROUPING(<column reference>) that looks like it's supposed to do this. It's a bit vague.

Postgres implements this as GROUPING(<col1>, <col2>, ..., <colN>) (0 if the <colN> is contained and 1 if it is not contained in the aggregation level, which is the inverse of what Hive does)

http://www.postgresql.org/docs/9.5/static/functions-aggregate.html

@cpcloud
Copy link

cpcloud commented Mar 16, 2016

The standard section is 4.16 in Part 2 of SQL:2011

@raghavsethi raghavsethi changed the title Add GROUPING_ID support to Presto Add GROUPING support to Presto Mar 16, 2016
@raghavsethi
Copy link
Contributor Author

@cpcloud We will almost certainly implement this as per the spec.

@martint
Copy link
Contributor

martint commented Mar 17, 2016

It's specified in more detail in section 4.9:

  1. If a <grouping operation> is specified, then:
    a) Let T be the aggregation query of <set function specification> that contains <grouping operation>. Each <column reference> shall reference a grouping column of T.
    b) The declared type of the result is exact numeric with an implementation-defined precision and a scale of 0 (zero).
    c) If more than one <column reference> is specified, then let N be the number of <column reference>s and let CR_i, 1 (one) ≤ i ≤ N, be the i-th <column reference>.
    GROUPING ( CR_1, ..., CR_N-1, CR_N )
    is equivalent to:
    CAST ( ( 2 * GROUPING ( CR_1, ..., CR_N-1 ) + GROUPING ( CR_N ) ) AS IDT )
    where IDT is the implementation-defined declared type of the result.

@martint
Copy link
Contributor

martint commented Mar 17, 2016

For completeness, from section 4.16.2:

The grouping operation is of the form GROUPING(<column reference>). The result of such an invocation is 1 (one) in the case of a row whose values are the results of aggregation over that <column reference> during the execution of a grouped query containing CUBE, ROLLUP, or GROUPING SETS, and 0 (zero) otherwise.

@cpcloud
Copy link

cpcloud commented Mar 17, 2016

Nice, thanks for posting that.

@petroav
Copy link
Contributor

petroav commented Apr 14, 2016

I'd like to start work on this if no one else is looking into it.

@martint martint assigned petroav and unassigned raghavsethi Apr 14, 2016
@petroav
Copy link
Contributor

petroav commented Jun 17, 2016

I spent some time trying to understand the 4.16.2 section of the spec quoted by Martin above and I'd like to share my findings for the benefit of others who might want to verify that my implementation of GROUPING() conforms to the spec.

The phrase "aggregation over that <column reference>" means that that column is NOT included in the grouping columns and is instead part of the "grouping window", i.e. it is "aggregated over". On first reading I though that "aggregation over" meant that the <column reference> had to be part of the grouping columns.

Both Teradata and Postgres implement the GROUPING() function in this way so I think this is the right interpretation.

@parekhparth
Copy link

@petroav, any updates here?

@petroav
Copy link
Contributor

petroav commented Nov 6, 2016

@parekhparth yes I'm going to put up a Teradata internal review tomorrow. The patch was re-written to use a synthetic type so that the arguments are not serialized/de-serialized at every invocation.

@BradRuderman
Copy link

@petroav How did the review go?

@martint
Copy link
Contributor

martint commented May 4, 2017

This is close, but it's still being worked on: #7712

@petroav
Copy link
Contributor

petroav commented May 22, 2017

Closing this since the feature got merged #8043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants