-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20010][SQL] Sort information is lost after sort merge join #17339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #74770 has started for PR 17339 at commit |
|
retest this please |
|
Test build #74783 has finished for PR 17339 at commit
|
|
test this please |
|
retest this please |
| child: Expression, | ||
| direction: SortDirection, | ||
| nullOrdering: NullOrdering, | ||
| sameOrderExpressions: Set[Expression]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we don't need this, and we can rely on EqualTo constraint to infer this information. Unfortunately the constraint only exists in logical plan, so we can't find a better solution for this case. cc @gatorsmile do you have a better idea?
|
Test build #74873 has finished for PR 17339 at commit
|
|
Test build #74880 has finished for PR 17339 at commit
|
|
retest this please |
|
Test build #74888 has finished for PR 17339 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
After sort merge join for inner join, now we only keep left key ordering. However, after inner join, right key has the same value and order as left key. So if we need another smj on right key, we will unnecessarily add a sort which causes additional cost.
As a more complicated example, A join B on A.key = B.key join C on B.key = C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when join {A, B} and C, and add a sort on A.key when join {A, B, C} and D.
To fix this, we need to propagate all sorted information (equivalent expressions) from bottom up through
outputOrderingandSortOrder.How was this patch tested?
Test cases are added.