Add repartition parameter #191

arahuja · 2014-03-27T15:00:55Z

Coalesce doesn't currently perform as documented since all of the transformation operations happen after it so you never actually get the proper number of output partitions.

I added a repartition at the top of the job to remap the data to more or less partitions (this helps with increasing parallelism) and moved coalesce to the end to create the proper number of outputs (can it go after the sort?, not sure the expectations on output are after the sort)

Also, moved both parameters to SparkArgs as they are likely to be needed in other operations (i.e. reads2ref) There may be better defaults too, but since they both add some overhead they are off by default

AmplabJenkins · 2014-03-27T15:01:23Z

Can one of the admins verify this patch?

carlyeks · 2014-03-27T15:08:13Z

Jenkins, test this please.

AmplabJenkins · 2014-03-27T15:15:38Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/251/

AmplabJenkins · 2014-03-27T15:27:38Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/252/

Add repartition parameter

massie · 2014-03-29T03:52:19Z

Thanks, Arun!

add repartition parameter

d07816f

massie added a commit that referenced this pull request Mar 29, 2014

Merge pull request #191 from hammerlab/repartition

25465f8

Add repartition parameter

massie merged commit 25465f8 into bigdatagenomics:master Mar 29, 2014

arahuja deleted the repartition branch April 7, 2014 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add repartition parameter #191

Add repartition parameter #191

arahuja commented Mar 27, 2014

AmplabJenkins commented Mar 27, 2014

carlyeks commented Mar 27, 2014

AmplabJenkins commented Mar 27, 2014

AmplabJenkins commented Mar 27, 2014

massie commented Mar 29, 2014

Add repartition parameter #191

Add repartition parameter #191

Conversation

arahuja commented Mar 27, 2014

AmplabJenkins commented Mar 27, 2014

carlyeks commented Mar 27, 2014

AmplabJenkins commented Mar 27, 2014

AmplabJenkins commented Mar 27, 2014

massie commented Mar 29, 2014