Skip to content

Conversation

@nitin2goyal
Copy link

Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Project [a#0,b#1]
   LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Filter (a#0 = 3)
   Project [a#0,b#1]
    LocalRelation [a#0,b#1,c#2]
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.
@nitin2goyal
Copy link
Author

cc @andrewor14

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44238 has finished for PR 9253 at commit ca487cb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 24, 2015

Test build #44284 has finished for PR 9253 at commit 6a9f738.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nitin2goyal
Copy link
Author

@andrewor14 - not sure which test has failed. can we retest this please?

@andrewor14
Copy link
Contributor

retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use @param here

@andrewor14
Copy link
Contributor

Looks great! I look forward to getting this merged. Once you address the comments I will do so.

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44592 has finished for PR 9253 at commit 6a9f738.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44609 has finished for PR 9253 at commit 36db8a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nitin2goyal
Copy link
Author

Thanks fore reviewing Andrew ( @andrewor14 ). Have addressed your comments. Let me know if it looks good.

@andrewor14
Copy link
Contributor

@nitin2goyal Sorry for the delay. This LGTM. I will merge it once you rebase to master again.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Aggregate.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala
@SparkQA
Copy link

SparkQA commented Nov 13, 2015

Test build #45870 has finished for PR 9253 at commit aa4a7ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaGradientBoostingClassificationExample\n * public class JavaGradientBoostingRegressionExample\n * public class JavaRandomForestClassificationExample\n * public class JavaRandomForestRegressionExample\n

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 14, 2015

Test build #45895 has finished for PR 9253 at commit aa4a7ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in c939c70 Nov 14, 2015
asfgit pushed a commit that referenced this pull request Nov 14, 2015
Also introduces new spark private API in RDD.scala with name 'mapPartitionsInternal' which doesn't closure cleans the RDD elements.

Author: nitin goyal <nitin.goyal@guavus.com>
Author: nitin.goyal <nitin.goyal@guavus.com>

Closes #9253 from nitin2goyal/master.

(cherry picked from commit c939c70)
Signed-off-by: Andrew Or <andrew@databricks.com>
@tedyu
Copy link
Contributor

tedyu commented Nov 15, 2015

Should mapPartitions() be replaced with mapPartitionsInternal() in the following classes ?

    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
    val rootType = schemaData.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala
    json.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala
    rows.mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONRelation.scala
        .mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
      .mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala
      child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
    data.mapPartitions { iterator =>
    data.mapPartitions { iterator =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala
    streamedPlan.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
    streamedPlan.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashOuterJoin.scala
    val matchesOrStreamedRowsWithNulls = streamed.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoin.scala
    streamed.execute().mapPartitions { streamedIter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinBNL.scala
    rdd.mapPartitions { iter =>
    inputRDD.mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/python.scala
    child.execute().mapPartitions { iter =>
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/rowFormatConverters.scala
    child.execute().mapPartitions { iter =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala
    child.execute().mapPartitions { stream =>
/Users/tyu/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala

If so, allow me to open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants