-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51820][SQL] Move UnresolvedOrdinal construction before analysis to avoid issue with group by ordinal
#50606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
da5bdea to
d0a8187
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am removing some test cases from AggregateResolverSuite because we need to support the new behavior in single-pass analyzer as well, but that might bloat this PR too much. I would like to do that in a followup where I will revert this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy these test contents to the Jira so we don't forget?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
d0a8187 to
daa2aa3
Compare
UnresolvedOrdinal construction before analysis to avoid issue with group by ordinal
daa2aa3 to
56bbcb1
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/classic/Dataset.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy these test contents to the Jira so we don't forget?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
|
shall we also handle Spark Connect queries in |
56bbcb1 to
33ec241
Compare
33ec241 to
464d2ab
Compare
Sure, done! |
sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/AggregateResolverSuite.scala
Outdated
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
...test/scala/org/apache/spark/sql/catalyst/analysis/GroupByOrdinalsRepeatedAnalysisSuite.scala
Outdated
Show resolved
Hide resolved
|
Just one thing we need to check - if the view is persisted with The view must keep its confs. |
464d2ab to
62729a8
Compare
62729a8 to
e62c9d6
Compare
|
thanks, merging to master! |
### What changes were proposed in this pull request? This is a followup to #50606 to fix the origin and context of newly created `UnresolvedOrdinal`. ### Why are the changes needed? Origin and context of `UnresolvedOrdinal` should be the same as for the original `Literal`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #50676 from mihailotim-db/mihailotim-db/fix_origin. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…`order` by ordinal approach ### What changes were proposed in this pull request? This is a followup to #50606, to address remaining issues in fixed-point analyzer and parser. This PR does the following: 1. Set correct origin in `SparkConnect` and `Dataframe` API 2. Handle remaining cases for order by ordinal in `Dataframe` API 3. Add a test suite for Dataframe use cases since we are missing those ### Why are the changes needed? To address current issue with `UnresolvedOrdinal` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a new test suite and existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #50699 from mihailotim-db/mihailotim-db/fix_ordinal_followup. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…hen appending grouping to aggregate expressions ### What changes were proposed in this pull request? Don't add `UnresolvedOrdinal` when appending grouping to aggregate expressions in Spark Connect. ### Why are the changes needed? Change is needed to fix a regression caused by #50606 where `UnresolvedOrdinal` would end up in aggregate expression and propagate all the way to `CheckAnalysis` where it would throw an error ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a test case for the correct behavior. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50958 from mihailotim-db/mihailotim-db/fix_unresolved_ordinal. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…`order` by ordinal approach ### What changes were proposed in this pull request? This is a followup to apache#50606, to address remaining issues in fixed-point analyzer and parser. This PR does the following: 1. Set correct origin in `SparkConnect` and `Dataframe` API 2. Handle remaining cases for order by ordinal in `Dataframe` API 3. Add a test suite for Dataframe use cases since we are missing those ### Why are the changes needed? To address current issue with `UnresolvedOrdinal` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a new test suite and existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50699 from mihailotim-db/mihailotim-db/fix_ordinal_followup. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…hen appending grouping to aggregate expressions ### What changes were proposed in this pull request? Don't add `UnresolvedOrdinal` when appending grouping to aggregate expressions in Spark Connect. ### Why are the changes needed? Change is needed to fix a regression caused by apache#50606 where `UnresolvedOrdinal` would end up in aggregate expression and propagate all the way to `CheckAnalysis` where it would throw an error ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a test case for the correct behavior. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50958 from mihailotim-db/mihailotim-db/fix_unresolved_ordinal. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

What changes were proposed in this pull request?
This is a followup to #43797 and #50461 where hacks were introduced in order to solve the issue of repeated analysis on plans that have a group by ordinal. The latter PR caused a regression so the hack needs to be removed. This PR proposed a move of
UnresolvedOrdinalconstruction before Analyzer runs.Why are the changes needed?
We are reverting a hack introduced in the previous PRs to improve the behavior of group by ordinal and additionally fix the issue that #50461 was trying to solve.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No