Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates without grouping keys, this triggers the ungrouped code path (which aways returns a single row).

This PR fixes the RemoveLiteralFromGroupExpressions optimizer rule, which changes the semantics of the Aggregate by eliminating all literal grouping keys.

How was this patch tested?

Added tests to SQLQueryTestSuite.

@hvanhovell
Copy link
Contributor Author

cc @davies @cloud-fan

* Removes literals from group expressions in [[Aggregate]], as they have no effect to the result
* but only makes the grouping key bigger.
*/
object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an existing unit test suite for this? might be good to add a test case there too.

@davies
Copy link
Contributor

davies commented Sep 14, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65398 has finished for PR 15101 at commit 25eee30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65406 has finished for PR 15101 at commit 42e3698.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Do not rewrite the aggregate if we drop all grouping expressions, because this can
// change the return semantics when the input of the Aggregate is empty. See SPARK-17114
// for more information.
a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a.copy(groupingExpressions = Seq(grouping.head))? I think we can still remove some literal grouping if we keep one of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it might be even better to replace it with something that is trivial to hash.

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65434 has finished for PR 15101 at commit e6c4b9c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins.

@hvanhovell
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65435 has finished for PR 15101 at commit e6c4b9c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 15, 2016

(My bad on MiMa issue -- should be fixed in master, retesting ...)

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #3270 has finished for PR 15101 at commit e6c4b9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

Merging to master/2.0. Thanks for the reviews.

asfgit pushed a commit that referenced this pull request Sep 15, 2016
## What changes were proposed in this pull request?
This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates **without** grouping keys, this triggers the ungrouped code path (which aways returns a single row).

This PR fixes the `RemoveLiteralFromGroupExpressions` optimizer rule, which changes the semantics of the Aggregate by eliminating all literal grouping keys.

## How was this patch tested?
Added tests to `SQLQueryTestSuite`.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #15101 from hvanhovell/SPARK-17114-3.

(cherry picked from commit d403562)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@asfgit asfgit closed this in d403562 Sep 15, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?
This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates **without** grouping keys, this triggers the ungrouped code path (which aways returns a single row).

This PR fixes the `RemoveLiteralFromGroupExpressions` optimizer rule, which changes the semantics of the Aggregate by eliminating all literal grouping keys.

## How was this patch tested?
Added tests to `SQLQueryTestSuite`.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#15101 from hvanhovell/SPARK-17114-3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants