Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes an issue with aggregates that have an empty input, and use a literals as their grouping keys. These aggregates are currently interpreted as aggregates without grouping keys, this triggers the ungrouped code path (which aways returns a single row).

This PR fixes this by adding a isGrouped flag to Aggregate. This flag is inferred when the Aggregate is created, but is kept when we change the Aggregates grouping expressions. I have tried to implement this by wrapping the groupingExpressions in an Option but this is much more invasive and leads to much harder to use code.

How was this patch tested?

Added tests to SQLQueryTestSuite.

@hvanhovell
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65309 has finished for PR 15076 at commit f8f1ebb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65312 has finished for PR 15076 at commit 87c19a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65313 has finished for PR 15076 at commit fb63ecf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

!expressions.exists(!_.resolved) &&
childrenResolved &&
!hasWindowExpressions &&
(isGrouped || groupingExpressions.isEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have this condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter if isGrouped has any grouping expressions (literal grouping expressions are eliminated during optimization). It is however problematic when a not-grouped Aggregate has grouping expressions; this means that we have not derived the isGrouped flag correctly.

@davies
Copy link
Contributor

davies commented Sep 14, 2016

@hvanhovell What's the reason that we can not just use the literal use grouping key?

@hvanhovell
Copy link
Contributor Author

@davies literal grouping keys get eliminated during optimization. We could modify the optimization rule, but I do feel this is more robust and it is more clear on what we intent to do. It probably also saves a few cycles during query execution.

@davies
Copy link
Contributor

davies commented Sep 14, 2016

@hvanhovell Since the optimization rule change the result (or sematics), I'd like to fix it, not to introduce more complexcity in physical layer. The saved few cycles may not worth, because people usually not use that in practice.

@hvanhovell
Copy link
Contributor Author

closing this in favor of #15101.

@hvanhovell hvanhovell closed this Sep 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants