[SPARK-34609][SQL] Unify resolveExpressionBottomUp and resolveExpressionTopDown #31728

cloud-fan · 2021-03-03T16:05:13Z

What changes were proposed in this pull request?

It's a bit confusing to see resolveExpressionBottomUp and resolveExpressionTopDown, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to? resolveExpressionTopDown resolves columns using output attributes of the plan children, resolveExpressionBottomUp resolves columns using output attributes of the plan itself.

This PR unifies resolveExpressionBottomUp and resolveExpressionTopDown and put the common logic in a new method, and let resolveExpressionBottomUp and resolveExpressionTopDown just call the new method. This PR also renames resolveExpressionBottomUp and resolveExpressionTopDown to make the difference clear.

Why are the changes needed?

code cleanup

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

cloud-fan · 2021-03-03T16:07:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

The doc is mostly from the old resolveExpressionBottomUp

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

cloud-fan · 2021-03-03T16:07:55Z

cc @viirya @maropu @HyukjinKwon

SparkQA · 2021-03-03T16:58:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40299/

SparkQA · 2021-03-03T17:03:38Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40299/

SparkQA · 2021-03-03T17:49:08Z

Test build #135717 has finished for PR 31728 at commit fef599b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2021-03-04T07:26:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Could we move the orElse part into resolveExpression ? It seems it is the same between resolveExpressionByPlanOutput and resolveExpressionByPlanChildren .

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

viirya · 2021-03-04T08:26:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+   * Example :
+   * SELECT a.b FROM t ORDER BY b[0].d
+   *
+   * In the above example, in b needs to be resolved before d can be resolved. Given we are


in b -> b?

This is copied from the old code verbatimly. It's actually not accurate as b[0].d won't fail if b can't be resolved. It just returns the unresolved expression.

Let me rewrite this doc.

viirya · 2021-03-04T08:32:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+      resolveColumnByOrdinal = ordinal => {
+        val candidates = q.children.flatMap(_.output)
+        assert(ordinal >= 0 && ordinal < candidates.length)
+        candidates.apply(ordinal)
+      },


Does resolveExpressionTopDown have this logic originally? Seems I cannot find it.

No it doesn't, but I can't find a reason why we shouldn't have it. This doesn't hurt anyway.

The only difference should be where to resolve the columns: plan output vs plan children output. So that it's easy for developers to decide which one to call.

Because here looks like we flatten all children outputs and let GetColumnByOrdinal resolve to ordinal in the flattened outputs. It doesn't like any other GetColumnByOrdinal usage so I have a question here. It works if the expr/query plan of GetColumnByOrdinal considers the ordinal correctly.

This is a good point. Since this logic is not really needed anywhere, maybe we can use a stricter definition first. We can say that it only works if the plan has one child and we look up the attribute from output attributes of that child.

I'll update this in the followup PR (I have some other related code cleanup in mind), to avoid waiting for the QA job.

SparkQA · 2021-03-04T10:07:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40327/

SparkQA · 2021-03-04T10:45:39Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40327/

SparkQA · 2021-03-04T11:09:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40331/

SparkQA · 2021-03-04T11:37:28Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40331/

SparkQA · 2021-03-04T12:20:27Z

Test build #135743 has finished for PR 31728 at commit 8c1697c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-04T13:41:51Z

Test build #135745 has finished for PR 31728 at commit 8332a43.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-03-04T14:52:47Z

Test build #135749 has finished for PR 31728 at commit 7c2de54.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-03-05T05:53:45Z

thanks for the review, merging to master!

HyukjinKwon

LGTM2

…ionTopDown ### What changes were proposed in this pull request? It's a bit confusing to see `resolveExpressionBottomUp` and `resolveExpressionTopDown`, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to? `resolveExpressionTopDown` resolves columns using output attributes of the plan children, `resolveExpressionBottomUp` resolves columns using output attributes of the plan itself. This PR unifies `resolveExpressionBottomUp` and `resolveExpressionTopDown` and put the common logic in a new method, and let `resolveExpressionBottomUp` and `resolveExpressionTopDown` just call the new method. This PR also renames `resolveExpressionBottomUp` and `resolveExpressionTopDown` to make the difference clear. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes apache#31728 from cloud-fan/resolve. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Cherry-picked from dc78f33 by @kbendickson. (cherry picked from commit d93c561) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

cloud-fan commented Mar 3, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated Show resolved Hide resolved

github-actions bot added the SQL label Mar 3, 2021

cloud-fan force-pushed the resolve branch from fef599b to 8c1697c Compare March 4, 2021 07:12

maropu reviewed Mar 4, 2021

View reviewed changes

unify resolveExpressionBottomUp and resolveExpressionTopDown

8332a43

cloud-fan force-pushed the resolve branch from 8c1697c to 8332a43 Compare March 4, 2021 08:20

viirya reviewed Mar 4, 2021

View reviewed changes

update doc

7c2de54

maropu approved these changes Mar 4, 2021

View reviewed changes

viirya approved these changes Mar 4, 2021

View reviewed changes

cloud-fan closed this in dc78f33 Mar 5, 2021

cloud-fan mentioned this pull request Mar 5, 2021

[SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression #31758

Closed

HyukjinKwon reviewed Mar 7, 2021

View reviewed changes

[SPARK-34609][SQL] Unify resolveExpressionBottomUp and resolveExpressionTopDown #31728

[SPARK-34609][SQL] Unify resolveExpressionBottomUp and resolveExpressionTopDown #31728

Uh oh!

Conversation

cloud-fan commented Mar 3, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan Mar 3, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloud-fan commented Mar 3, 2021

Uh oh!

SparkQA commented Mar 3, 2021

Uh oh!

SparkQA commented Mar 3, 2021

Uh oh!

SparkQA commented Mar 3, 2021

Uh oh!

maropu Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

viirya Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

viirya Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 5, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

SparkQA commented Mar 4, 2021

Uh oh!

cloud-fan commented Mar 5, 2021

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan Mar 4, 2021 •

edited

Loading