-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17447] Performance improvement in Partitioner.defaultPartitioner without sortBy #15039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) { | ||
| return r.partitioner.get | ||
| val rdds = Seq(rdd) ++ others | ||
| val filteredRdds = rdds.filter( _.partitioner.exists(_.numPartitions > 0 )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the code I posted in minor ways. There should be no extra spaces around operators; there should be a space after 'if'; this uses return unnecessarily. This is a lot of time spent on a trivial change, so I'd appreciate it if you read the guidance on things like style, and read feedback carefully if you're pursuing this. Or else close this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen thank you very much , i am a new hand about spark, but i'm interested in it very much,i have fixed my code style.thanks.
| if (rdd.context.conf.contains("spark.default.parallelism")) { | ||
| new HashPartitioner(rdd.context.defaultParallelism) | ||
| val rdds = (Seq(rdd) ++ others) | ||
| val hashPartitioner = rdds.filter(_.partitioner.exists(_.numPartitions > 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasPartitioner, not hashPartitioner. You should copy the code I provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Thank you ,I will lean much about code style.I have updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Please check, Thank you very much!
|
Jenkins test this please |
|
Test build #65224 has finished for PR 15039 at commit
|
|
Merged to master |
…itioner without sortBy apache#15039
…er without sortBy ## What changes were proposed in this pull request? if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gain the same goal. ## How was this patch tested? manual tests Author: codlife <1004910847@qq.com> Closes apache#15039 from codlife/master.
What changes were proposed in this pull request?
if there are many rdds in some situations,the sort will loss he performance servely,actually we needn't sort the rdds , we can just scan the rdds one time to gain the same goal.
How was this patch tested?
manual tests