-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP BY causes an analysis error #14546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #63384 has finished for PR 14546 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @gatorsmile .
|
Test build #63398 has finished for PR 14546 at commit
|
|
Test build #63401 has finished for PR 14546 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a conf conf.orderByOrdinal to control whether the integer values are analyzed as positions. Thus, the current fix ignores this conf. Could you fix it? Also added a test case to ensure both options are covered. That is, true and false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the false case, you meant to check ResolveAggregateFunctions functionality, right?
|
Test build #63408 has finished for PR 14546 at commit
|
|
Test build #63418 has finished for PR 14546 at commit
|
|
Test build #63419 has finished for PR 14546 at commit
|
|
Hi, @yhuai . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that this guard is wrong. This disables this entire clause if conf.orderByOrdinal is false. Shouldn't it be: !conf.orderByOrdinal || sortOrder.forall(x => IntegerIndex.unapply(x.child).isEmpty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I see what I missed. You're right. I will fix like that.
|
Test build #63527 has finished for PR 14546 at commit
|
|
Test build #63528 has finished for PR 14546 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove is a transitive verb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eliminate the useless position numbers?
|
LGTM except minor comments. cc @cloud-fan @hvanhovell |
|
Thank you for review, @gatorsmile . |
|
Test build #63579 has finished for PR 14546 at commit
|
|
I believe this doesn't fix all the cases. How about |
|
Thank you, @clockfly . I'll check that! |
|
Hi, @clockfly . However, this PR solve that problem too. Your case makes exceptions in current master. But in this PR, scala> sql("select count(*), a from (select 1 as a) tmp group by 2 having a > 0").show
+--------+---+
|count(1)| a|
+--------+---+
| 1| 1|
+--------+---+ |
|
Anyway, I rebased the branch to resolve the conflict. I checked your case after rebasing. So, you can checkout and see the result without conflicts. |
|
Test build #63631 has finished for PR 14546 at commit
|
|
@dongjoon-hyun The exception was muted by line: If you add some log message, you will find it still throws exception like: |
|
I think the root cause is that the Aggregate operator is treated as resolved if even it has group by ordinals. For example: Aggregate is treated as resolved even if it has a group by ordinal "2". Then, it tries to resolve the Actually this plan is already wrong. As we are asking for ordinal "2", but actually there is only one |
|
Similar case happens to order by. We don't need "order by ordinal" to reproduce the Analysis error. Aggregate is treated as resolved even if it has a group by ordinal "2". Then, it tries to resolve the Sort by putting the This plan is wrong because we are asking for ordinal "2", but actually there is only one |
|
I think a proper fix will be marking ordinal unresolved, the ordinal can exists in group by or order by expression. Then we can make sure the ResolveAggregateFunctions and other analyzer rules doesn't assume |
|
@dongjoon-hyun I have implemented the idea in #14616 |
|
@dongjoon-hyun Seems this issue has been fixed as a by-product of #14595. How about we close this? Also, feel free to look at @clockfly's follow-up pr. |
|
Yep. I confirmed that it was nicely resolved at 6bf20cd . |
What changes were proposed in this pull request?
Spark supports
ordinalin GROUP BY and ORDER BY. However, if we use both at the same time, it causes exceptions. The root cause was thatResolveAggregateFunctionsrule removed the ordinals beforeResolveOrdinalInOrderByAndGroupByapplied.Before
After
How was this patch tested?
Pass the Jenkins with new test cases.