Skip to content

Conversation

@wzhfy
Copy link
Contributor

@wzhfy wzhfy commented Mar 18, 2017

What changes were proposed in this pull request?

After sort merge join for inner join, now we only keep left key ordering. However, after inner join, right key has the same value and order as left key. So if we need another smj on right key, we will unnecessarily add a sort which causes additional cost.

As a more complicated example, A join B on A.key = B.key join C on B.key = C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when join {A, B} and C, and add a sort on A.key when join {A, B, C} and D.

To fix this, we need to propagate all sorted information (equivalent expressions) from bottom up through outputOrdering and SortOrder.

How was this patch tested?

Test cases are added.

@wzhfy
Copy link
Contributor Author

wzhfy commented Mar 18, 2017

cc @cloud-fan @hvanhovell

@SparkQA
Copy link

SparkQA commented Mar 18, 2017

Test build #74770 has started for PR 17339 at commit f3b5b34.

@wzhfy
Copy link
Contributor Author

wzhfy commented Mar 18, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Mar 18, 2017

Test build #74783 has finished for PR 17339 at commit f3b5b34.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SortOrder(

@wzhfy
Copy link
Contributor Author

wzhfy commented Mar 20, 2017

test this please

@cloud-fan
Copy link
Contributor

retest this please

child: Expression,
direction: SortDirection,
nullOrdering: NullOrdering,
sameOrderExpressions: Set[Expression])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we don't need this, and we can rely on EqualTo constraint to infer this information. Unfortunately the constraint only exists in logical plan, so we can't find a better solution for this case. cc @gatorsmile do you have a better idea?

@SparkQA
Copy link

SparkQA commented Mar 20, 2017

Test build #74873 has finished for PR 17339 at commit 9d2ab54.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SortOrder(

@SparkQA
Copy link

SparkQA commented Mar 20, 2017

Test build #74880 has finished for PR 17339 at commit 9d2ab54.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SortOrder(

@wzhfy
Copy link
Contributor Author

wzhfy commented Mar 20, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Mar 20, 2017

Test build #74888 has finished for PR 17339 at commit 9d2ab54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SortOrder(

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants