-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7970] Skip closure cleaning for SQL operations #9253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…'group by' attribute set
…'group by' attribute set
…'group by' attribute set
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.
Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions. Query plan before optimisation :- Filter ((c#138L = 2) && (a#0 = 3)) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] Query plan after optimisation :- Filter (c#138L = 2) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Filter (a#0 = 3) Project [a#0,b#1] LocalRelation [a#0,b#1,c#2]
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.
|
cc @andrewor14 |
|
ok to test |
|
Test build #44238 has finished for PR 9253 at commit
|
|
Test build #44284 has finished for PR 9253 at commit
|
|
@andrewor14 - not sure which test has failed. can we retest this please? |
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use @param here
|
Looks great! I look forward to getting this merged. Once you address the comments I will do so. |
|
Test build #44592 has finished for PR 9253 at commit
|
|
Test build #44609 has finished for PR 9253 at commit
|
|
Thanks fore reviewing Andrew ( @andrewor14 ). Have addressed your comments. Let me know if it looks good. |
|
@nitin2goyal Sorry for the delay. This LGTM. I will merge it once you rebase to master again. |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Aggregate.scala sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala
|
Test build #45870 has finished for PR 9253 at commit
|
|
retest this please |
|
Test build #45895 has finished for PR 9253 at commit
|
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements. Author: nitin goyal <nitin.goyal@guavus.com> Author: nitin.goyal <nitin.goyal@guavus.com> Closes #9253 from nitin2goyal/master. (cherry picked from commit c939c70) Signed-off-by: Andrew Or <andrew@databricks.com>
|
Should mapPartitions() be replaced with mapPartitionsInternal() in the following classes ? If so, allow me to open a PR |
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.