Skip to content

Conversation

@codlife
Copy link
Contributor

@codlife codlife commented Sep 10, 2016

What changes were proposed in this pull request?

if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gain the same goal.

How was this patch tested?

manual tests

@codlife codlife changed the title [SPARK-17447] Performance improvement in Partitioner.defaultPartitioner [SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy Sep 10, 2016
for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) {
return r.partitioner.get
val rdds = Seq(rdd) ++ others
val filteredRdds = rdds.filter( _.partitioner.exists(_.numPartitions > 0 ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the code I posted in minor ways. There should be no extra spaces around operators; there should be a space after 'if'; this uses return unnecessarily. This is a lot of time spent on a trivial change, so I'd appreciate it if you read the guidance on things like style, and read feedback carefully if you're pursuing this. Or else close this.

Copy link
Contributor Author

@codlife codlife Sep 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen thank you very much , i am a new hand about spark, but i'm interested in it very much,i have fixed my code style.thanks.

if (rdd.context.conf.contains("spark.default.parallelism")) {
new HashPartitioner(rdd.context.defaultParallelism)
val rdds = (Seq(rdd) ++ others)
val hashPartitioner = rdds.filter(_.partitioner.exists(_.numPartitions > 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasPartitioner, not hashPartitioner. You should copy the code I provided.

Copy link
Contributor Author

@codlife codlife Sep 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thank you ,I will lean much about code style.I have updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Please check, Thank you very much!

@srowen
Copy link
Member

srowen commented Sep 11, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Sep 11, 2016

Test build #65224 has finished for PR 15039 at commit f5d1e24.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

Merged to master

@asfgit asfgit closed this in 4efcdb7 Sep 12, 2016
zzcclp added a commit to zzcclp/spark that referenced this pull request Sep 13, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
…er without sortBy

## What changes were proposed in this pull request?

if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gain the same goal.

## How was this patch tested?

manual tests

Author: codlife <1004910847@qq.com>

Closes apache#15039 from codlife/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants