-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8964] [SQL] Use Exchange to perform shuffle in Limit #7334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
dfe6ff1
089f9f5
c02324c
70f69b6
272c349
964838f
cc63456
7dbb28e
9668c26
7202e89
fc7fe56
e3caa76
cffe4da
37f6688
b4de467
924925f
55e27af
c4b0a53
d6e7802
b8c9e47
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -337,8 +337,12 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { | |
| execution.Sample(lb, ub, withReplacement, seed, planLater(child)) :: Nil | ||
| case logical.LocalRelation(output, data) => | ||
| LocalTableScan(output, data) :: Nil | ||
| case logical.ReturnAnswer(logical.Limit(IntegerLiteral(limit), child)) => | ||
| execution.CollectLimit(limit, planLater(child)) :: Nil | ||
| case logical.Limit(IntegerLiteral(limit), child) => | ||
| execution.Limit(limit, planLater(child)) :: Nil | ||
| val perPartitionLimit = execution.LocalLimit(limit, planLater(child)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we have a test for this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Today this will be planned as a What do you think this test should be asserting?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking that we did not have this kind of test case, right?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not allowed. See my PR: #10689 If you think this is valid, should we reopen it? Thanks!
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @davies, to clarify, are you proposing that we need to add a test to ensure that global limits below union are not pulled up?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile: I think we're in agreement. To recap: Before this patch, After the changes in this patch, this becomes What is not legal to do here is to pull the That plan would be semantically equivalent to executing @davies, were you suggesting that the current planning is wrong? Or that we need more tests to guard against incorrect changes to limit planning? I don't believe that the changes in this patch will affect the planning of the case being described here, since we're not making any changes to limit pull-up or push-down. I do have a followup patch in the works which takes @gatorsmile's two limit-pushdown patches and rebases them on top of the changes here: master...JoshRosen:limit-pushdown-2. In that patch, I do plan to add more tests to handle these pushdown-related concerns.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JoshRosen Thank you for your explanation! That is so great that my previous PR are useful. So far, I am unable to find more operators for limit push down, except outer join and union all: #10454 and #10451 |
||
| val globalLimit = execution.GlobalLimit(limit, perPartitionLimit) | ||
| globalLimit :: Nil | ||
| case logical.Union(unionChildren) => | ||
| execution.Union(unionChildren.map(planLater)) :: Nil | ||
| case logical.Except(left, right) => | ||
|
|
@@ -357,6 +361,7 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { | |
| BatchPythonEvaluation(udf, e.output, planLater(child)) :: Nil | ||
| case LogicalRDD(output, rdd) => PhysicalRDD(output, rdd, "ExistingRDD") :: Nil | ||
| case BroadcastHint(child) => planLater(child) :: Nil | ||
| case logical.ReturnAnswer(child) => planLater(child) :: Nil | ||
| case _ => Nil | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @marmbrus here to make sure it is safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this looks good to me.