[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy #15039

codlife · 2016-09-10T02:07:26Z

What changes were proposed in this pull request?

if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gain the same goal.

How was this patch tested?

manual tests

srowen · 2016-09-10T08:52:25Z

core/src/main/scala/org/apache/spark/Partitioner.scala

-    for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) {
-      return r.partitioner.get
+    val rdds = Seq(rdd) ++ others
+    val filteredRdds = rdds.filter( _.partitioner.exists(_.numPartitions > 0 ))


This doesn't match the code I posted in minor ways. There should be no extra spaces around operators; there should be a space after 'if'; this uses return unnecessarily. This is a lot of time spent on a trivial change, so I'd appreciate it if you read the guidance on things like style, and read feedback carefully if you're pursuing this. Or else close this.

@srowen thank you very much , i am a new hand about spark, but i'm interested in it very much,i have fixed my code style.thanks.

srowen · 2016-09-10T15:15:34Z

core/src/main/scala/org/apache/spark/Partitioner.scala

-    if (rdd.context.conf.contains("spark.default.parallelism")) {
-      new HashPartitioner(rdd.context.defaultParallelism)
+    val rdds = (Seq(rdd) ++ others)
+    val hashPartitioner = rdds.filter(_.partitioner.exists(_.numPartitions > 0))


hasPartitioner, not hashPartitioner. You should copy the code I provided.

@srowen Thank you ,I will lean much about code style.I have updated.

@srowen Please check, Thank you very much!

srowen · 2016-09-11T11:00:28Z

Jenkins test this please

SparkQA · 2016-09-11T13:19:33Z

Test build #65224 has finished for PR 15039 at commit f5d1e24.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-12T11:11:11Z

Merged to master

…itioner without sortBy apache#15039

…er without sortBy ## What changes were proposed in this pull request? if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gain the same goal. ## How was this patch tested? manual tests Author: codlife <1004910847@qq.com> Closes apache#15039 from codlife/master.

solve spark-17447

673c29b

codlife changed the title ~~[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner~~ [SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy Sep 10, 2016

Update Partitioner.scala

a460905

srowen reviewed Sep 10, 2016
View reviewed changes

fix code style

8ddc442

srowen reviewed Sep 10, 2016
View reviewed changes

Update Partitioner.scala

f5d1e24

asfgit closed this in 4efcdb7 Sep 12, 2016

zzcclp added a commit to zzcclp/spark that referenced this pull request Sep 13, 2016

[EXT][SPARK-17447] Performance improvement in Partitioner.defaultPart…

6e35db1

…itioner without sortBy apache#15039

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy #15039

[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy #15039

Uh oh!

codlife commented Sep 10, 2016

Uh oh!

srowen Sep 10, 2016

Uh oh!

codlife Sep 10, 2016 •

edited

Loading

Uh oh!

srowen Sep 10, 2016

Uh oh!

codlife Sep 10, 2016 •

edited

Loading

Uh oh!

codlife Sep 11, 2016

Uh oh!

srowen commented Sep 11, 2016

Uh oh!

SparkQA commented Sep 11, 2016

Uh oh!

srowen commented Sep 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy #15039

[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy #15039

Uh oh!

Conversation

codlife commented Sep 10, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen Sep 10, 2016

Choose a reason for hiding this comment

Uh oh!

codlife Sep 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Sep 10, 2016

Choose a reason for hiding this comment

Uh oh!

codlife Sep 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codlife Sep 11, 2016

Choose a reason for hiding this comment

Uh oh!

srowen commented Sep 11, 2016

Uh oh!

SparkQA commented Sep 11, 2016

Uh oh!

srowen commented Sep 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codlife Sep 10, 2016 •

edited

Loading

codlife Sep 10, 2016 •

edited

Loading