Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add repartition parameter #191

Merged
merged 1 commit into from
Mar 29, 2014
Merged

Conversation

arahuja
Copy link
Contributor

@arahuja arahuja commented Mar 27, 2014

Coalesce doesn't currently perform as documented since all of the transformation operations happen after it so you never actually get the proper number of output partitions.

I added a repartition at the top of the job to remap the data to more or less partitions (this helps with increasing parallelism) and moved coalesce to the end to create the proper number of outputs (can it go after the sort?, not sure the expectations on output are after the sort)

Also, moved both parameters to SparkArgs as they are likely to be needed in other operations (i.e. reads2ref) There may be better defaults too, but since they both add some overhead they are off by default

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@carlyeks
Copy link
Member

Jenkins, test this please.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/251/

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/252/

massie added a commit that referenced this pull request Mar 29, 2014
@massie massie merged commit 25465f8 into bigdatagenomics:master Mar 29, 2014
@massie
Copy link
Member

massie commented Mar 29, 2014

Thanks, Arun!

@arahuja arahuja deleted the repartition branch April 7, 2014 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants