-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47895][SQL] group by alias should be idempotent #50461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47895][SQL] group by alias should be idempotent #50461
Conversation
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
Outdated
Show resolved
Hide resolved
|
thanks, merging to master/4.0! |
### What changes were proposed in this pull request? This is a followup of #43797 . GROUP BY ALIAS has the same bug and this PR applies the same fix to GROUP BY ALIAS ### Why are the changes needed? For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable. ### Does this PR introduce _any_ user-facing change? Yes, we will fix the error. ### How was this patch tested? Test added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50461 from mihailom-db/mihailom-db/fixGroupBy. Authored-by: Mihailo Milosevic <mihailo.milosevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 02db872) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This PR reverts #50461 because it introduces a correctness issue by replacing an `Alias` with incorrect literal. A followup will be made with a different way to fix this issue. ### Why are the changes needed? In the below example, alias `abc` is replaced with `2` resulting in 2 rows instead of correct 3. Before erronous PR:  After erronous PR:  ### Does this PR introduce _any_ user-facing change? User now sees an error message instead of an incorrect result. ### How was this patch tested? Added a test case to check for this behavior in the future ### Was this patch authored or co-authored using generative AI tooling? No Closes #50567 from mihailotim-db/mihailotim-db/revert_group_by. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This PR reverts #50461 because it introduces a correctness issue by replacing an `Alias` with incorrect literal. A followup will be made with a different way to fix this issue. ### Why are the changes needed? In the below example, alias `abc` is replaced with `2` resulting in 2 rows instead of correct 3. Before erronous PR:  After erronous PR:  ### Does this PR introduce _any_ user-facing change? User now sees an error message instead of an incorrect result. ### How was this patch tested? Added a test case to check for this behavior in the future ### Was this patch authored or co-authored using generative AI tooling? No Closes #50567 from mihailotim-db/mihailotim-db/revert_group_by. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8718eba) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…sis to avoid issue with group by ordinal ### What changes were proposed in this pull request? This is a followup to #43797 and #50461 where hacks were introduced in order to solve the issue of repeated analysis on plans that have a group by ordinal. The latter PR caused a regression so the hack needs to be removed. This PR proposed a move of `UnresolvedOrdinal` construction before Analyzer runs. ### Why are the changes needed? We are reverting a hack introduced in the previous PRs to improve the behavior of group by ordinal and additionally fix the issue that #50461 was trying to solve. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #50606 from mihailotim-db/mihailotim-db/new_group_by_ordinal. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @mihailom-db , I just want to give you the context in order to prevent future accident like the following.
02db87282da [SPARK-47895][SQL] group by alias should be idempotent
b5bb75ca240 [SPARK-47895][SQL] group by all should be idempotent
ASF JIRA IDs have Fixed Versions field. Even for the direct follow-up PRs, we cannot reuse JIRA IDs when the original patch is released already. It's because the original PR and the follow-up PR cannot have the same Fixed Versions field. Please make it sure that ASF JIRA ID is different from any internal customer ticket systems.
So, please be careful about JIRA IDs next time.
### What changes were proposed in this pull request? This is a followup of apache#43797 . GROUP BY ALIAS has the same bug and this PR applies the same fix to GROUP BY ALIAS ### Why are the changes needed? For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable. ### Does this PR introduce _any_ user-facing change? Yes, we will fix the error. ### How was this patch tested? Test added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50461 from mihailom-db/mihailom-db/fixGroupBy. Authored-by: Mihailo Milosevic <mihailo.milosevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 9cc8b18) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This PR reverts apache#50461 because it introduces a correctness issue by replacing an `Alias` with incorrect literal. A followup will be made with a different way to fix this issue. ### Why are the changes needed? In the below example, alias `abc` is replaced with `2` resulting in 2 rows instead of correct 3. Before erronous PR:  After erronous PR:  ### Does this PR introduce _any_ user-facing change? User now sees an error message instead of an incorrect result. ### How was this patch tested? Added a test case to check for this behavior in the future ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50567 from mihailotim-db/mihailotim-db/revert_group_by. Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit f970fe2) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

What changes were proposed in this pull request?
This is a followup of #43797 . GROUP BY ALIAS has the same bug and this PR applies the same fix to GROUP BY ALIAS
Why are the changes needed?
For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable.
Does this PR introduce any user-facing change?
Yes, we will fix the error.
How was this patch tested?
Test added.
Was this patch authored or co-authored using generative AI tooling?
No.