[SPARK-7970] Skip closure cleaning for SQL operations #9253

nitin2goyal · 2015-10-23T15:49:25Z

Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

…'group by' attribute set

Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions. Query plan before optimisation :- Filter ((c#138L = 2) && (a#0 = 3)) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] Query plan after optimisation :- Filter (c#138L = 2) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Filter (a#0 = 3) Project [a#0,b#1] LocalRelation [a#0,b#1,c#2]

Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

nitin2goyal · 2015-10-23T15:50:02Z

cc @andrewor14

andrewor14 · 2015-10-23T17:45:07Z

ok to test

SparkQA · 2015-10-23T17:55:59Z

Test build #44238 has finished for PR 9253 at commit ca487cb.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-10-24T05:16:29Z

Test build #44284 has finished for PR 9253 at commit 6a9f738.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

nitin2goyal · 2015-10-26T15:08:16Z

@andrewor14 - not sure which test has failed. can we retest this please?

andrewor14 · 2015-10-29T11:44:57Z

retest this please

andrewor14 · 2015-10-29T11:45:16Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

just use @param here

andrewor14 · 2015-10-29T11:46:18Z

Looks great! I look forward to getting this merged. Once you address the comments I will do so.

SparkQA · 2015-10-29T14:25:34Z

Test build #44592 has finished for PR 9253 at commit 6a9f738.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-10-29T18:26:19Z

Test build #44609 has finished for PR 9253 at commit 36db8a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

nitin2goyal · 2015-10-30T03:22:06Z

Thanks fore reviewing Andrew ( @andrewor14 ). Have addressed your comments. Let me know if it looks good.

andrewor14 · 2015-11-10T19:43:59Z

@nitin2goyal Sorry for the delay. This LGTM. I will merge it once you rebase to master again.

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Aggregate.scala sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

SparkQA · 2015-11-13T19:21:45Z

Test build #45870 has finished for PR 9253 at commit aa4a7ce.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public class JavaGradientBoostingClassificationExample\n * public class JavaGradientBoostingRegressionExample\n * public class JavaRandomForestClassificationExample\n * public class JavaRandomForestRegressionExample\n

andrewor14 · 2015-11-13T21:12:31Z

retest this please

SparkQA · 2015-11-14T01:02:04Z

Test build #45895 has finished for PR 9253 at commit aa4a7ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements. Author: nitin goyal <nitin.goyal@guavus.com> Author: nitin.goyal <nitin.goyal@guavus.com> Closes #9253 from nitin2goyal/master. (cherry picked from commit c939c70) Signed-off-by: Andrew Or <andrew@databricks.com>

tedyu · 2015-11-15T17:25:17Z

Should mapPartitions() be replaced with mapPartitionsInternal() in the following classes ?

    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
    val rootType = schemaData.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala
    json.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala
    rows.mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONRelation.scala
        .mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
      .mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala
      child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
    data.mapPartitions { iterator =>
    data.mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala
    streamedPlan.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
    streamedPlan.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala
    val matchesOrStreamedRowsWithNulls = streamed.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoin.scala
    streamed.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinBNL.scala
    rdd.mapPartitions { iter =>
    inputRDD.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/python.scala
    child.execute().mapPartitions { iter =>
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/rowFormatConverters.scala
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala
    child.execute().mapPartitions { stream =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala

If so, allow me to open a PR

nitindexter added 7 commits October 19, 2015 12:21

SPARK-11179: Push filters through aggregate if filters are subset of …

4ee8058

…'group by' attribute set

SPARK-11179: Push filters through aggregate if filters are subset of …

3b016b7

…'group by' attribute set

SPARK-11179: Push filters through aggregate if filters are subset of …

671fbb3

…'group by' attribute set

Merge remote-tracking branch 'upstream/master'

20cf722

[SPARK-7970] Skip closure cleaning for SQL operations

ca487cb

Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

Fix minor typo

6a9f738

andrewor14 reviewed Oct 29, 2015
View reviewed changes

core/src/main/scala/org/apache/spark/rdd/RDD.scala Outdated

Copy link

Contributor

andrewor14 Oct 29, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use @param here

nitindexter added 2 commits October 29, 2015 21:18

Address review comments

07ae9ac

Address review comments

36db8a1

Merge branch 'master' of https://github.com/apache/spark

aa4a7ce

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Aggregate.scala sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

asfgit closed this in c939c70 Nov 14, 2015

[SPARK-7970] Skip closure cleaning for SQL operations #9253

[SPARK-7970] Skip closure cleaning for SQL operations #9253

Uh oh!

Conversation

nitin2goyal commented Oct 23, 2015

Uh oh!

nitin2goyal commented Oct 23, 2015

Uh oh!

andrewor14 commented Oct 23, 2015

Uh oh!

SparkQA commented Oct 23, 2015

Uh oh!

SparkQA commented Oct 24, 2015

Uh oh!

nitin2goyal commented Oct 26, 2015

Uh oh!

andrewor14 commented Oct 29, 2015

Uh oh!

andrewor14 Oct 29, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Oct 29, 2015

Uh oh!

SparkQA commented Oct 29, 2015

Uh oh!

SparkQA commented Oct 29, 2015

Uh oh!

nitin2goyal commented Oct 30, 2015

Uh oh!

andrewor14 commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 13, 2015

Uh oh!

andrewor14 commented Nov 13, 2015

Uh oh!

SparkQA commented Nov 14, 2015

Uh oh!

tedyu commented Nov 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants