Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes an issue in ResolveAggregateFunctions where non-aggregated nested fields in ORDER BY and HAVING are not resolved correctly. This is because nested fields are resolved as aliases that fail to be semantically equal to any grouping/aggregate expressions.

Why are the changes needed?

To fix an analyzer issue.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests.

@github-actions github-actions bot added the SQL label Jul 23, 2021
@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46088/

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46088/

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Test build #141569 has finished for PR 33498 at commit ded1a42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we update TempResolvedColumn to only take Attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here actually we shouldn't transform attributes, because for nested fields this u.nameParts will not correspond to the attribute. Updated to specifically match Alias.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46174/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46174/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141658 has finished for PR 33498 at commit 1889fe4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@allisonwang-db allisonwang-db force-pushed the spark-36275-resolve-agg-func branch from 1889fe4 to fb3931c Compare July 27, 2021 18:14
@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46241/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46241/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141728 has finished for PR 33498 at commit fb3931c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

cloud-fan pushed a commit that referenced this pull request Jul 28, 2021
… fields

### What changes were proposed in this pull request?
This PR fixes an issue in `ResolveAggregateFunctions` where non-aggregated nested fields in ORDER BY and HAVING are not resolved correctly. This is because nested fields are resolved as aliases that fail to be semantically equal to any grouping/aggregate expressions.

### Why are the changes needed?
To fix an analyzer issue.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests.

Closes #33498 from allisonwang-db/spark-36275-resolve-agg-func.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 23a6ffa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this in 23a6ffa Jul 28, 2021
// should undo it later and fail with "Column c2 not found".
agg.child.resolve(u.nameParts, resolver).map(TempResolvedColumn(_, u.nameParts))
.getOrElse(u)
agg.child.resolve(u.nameParts, resolver).map({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a single brace next time for map in this case: https://github.com/databricks/scala-style-guide#anonymous-methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants