-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34609][SQL] Unify resolveExpressionBottomUp and resolveExpressionTopDown #31728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is mostly from the old resolveExpressionBottomUp
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #135717 has finished for PR 31728 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the orElse part into resolveExpression ? It seems it is the same between resolveExpressionByPlanOutput and resolveExpressionByPlanChildren .
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
| * Example : | ||
| * SELECT a.b FROM t ORDER BY b[0].d | ||
| * | ||
| * In the above example, in b needs to be resolved before d can be resolved. Given we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in b -> b?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the old code verbatimly. It's actually not accurate as b[0].d won't fail if b can't be resolved. It just returns the unresolved expression.
Let me rewrite this doc.
| resolveColumnByOrdinal = ordinal => { | ||
| val candidates = q.children.flatMap(_.output) | ||
| assert(ordinal >= 0 && ordinal < candidates.length) | ||
| candidates.apply(ordinal) | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does resolveExpressionTopDown have this logic originally? Seems I cannot find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it doesn't, but I can't find a reason why we shouldn't have it. This doesn't hurt anyway.
The only difference should be where to resolve the columns: plan output vs plan children output. So that it's easy for developers to decide which one to call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because here looks like we flatten all children outputs and let GetColumnByOrdinal resolve to ordinal in the flattened outputs. It doesn't like any other GetColumnByOrdinal usage so I have a question here. It works if the expr/query plan of GetColumnByOrdinal considers the ordinal correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Since this logic is not really needed anywhere, maybe we can use a stricter definition first. We can say that it only works if the plan has one child and we look up the attribute from output attributes of that child.
I'll update this in the followup PR (I have some other related code cleanup in mind), to avoid waiting for the QA job.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135743 has finished for PR 31728 at commit
|
|
Test build #135745 has finished for PR 31728 at commit
|
|
Test build #135749 has finished for PR 31728 at commit
|
|
thanks for the review, merging to master! |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
…ionTopDown ### What changes were proposed in this pull request? It's a bit confusing to see `resolveExpressionBottomUp` and `resolveExpressionTopDown`, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to? `resolveExpressionTopDown` resolves columns using output attributes of the plan children, `resolveExpressionBottomUp` resolves columns using output attributes of the plan itself. This PR unifies `resolveExpressionBottomUp` and `resolveExpressionTopDown` and put the common logic in a new method, and let `resolveExpressionBottomUp` and `resolveExpressionTopDown` just call the new method. This PR also renames `resolveExpressionBottomUp` and `resolveExpressionTopDown` to make the difference clear. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes apache#31728 from cloud-fan/resolve. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Cherry-picked from dc78f33 by @kbendickson. (cherry picked from commit d93c561) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
It's a bit confusing to see
resolveExpressionBottomUpandresolveExpressionTopDown, which provide similar functionalities but with different tree traverse order. It turns out that the real difference between these 2 methods is: which attributes should the columns be resolved to?resolveExpressionTopDownresolves columns using output attributes of the plan children,resolveExpressionBottomUpresolves columns using output attributes of the plan itself.This PR unifies
resolveExpressionBottomUpandresolveExpressionTopDownand put the common logic in a new method, and letresolveExpressionBottomUpandresolveExpressionTopDownjust call the new method. This PR also renamesresolveExpressionBottomUpandresolveExpressionTopDownto make the difference clear.Why are the changes needed?
code cleanup
Does this PR introduce any user-facing change?
no
How was this patch tested?
existing tests