Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1497] Add union to GenomicRDD. #1526

Merged
merged 1 commit into from
May 17, 2017

Conversation

fnothaft
Copy link
Member

Resolves #1497. Also adds size methods to SequenceDictionary and RecordGroupDictionary. Requires the addition of a ClassTag to the GenericGenomicRDD case class.

@fnothaft fnothaft added this to the 0.23.0 milestone May 13, 2017
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2003/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1526/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains ce29373ec3eff23b4c14bc2a99f1a7a7a3efc59c # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1526/merge^{commit} # timeout=10Checking out Revision ce29373ec3eff23b4c14bc2a99f1a7a7a3efc59c (origin/pr/1526/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f ce29373ec3eff23b4c14bc2a99f1a7a7a3efc59cFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Resolves bigdatagenomics#1497. Also adds `size` methods to `SequenceDictionary` and
`RecordGroupDictionary`. Requires the addition of a `ClassTag` to the
`GenericGenomicRDD` case class.
@coveralls
Copy link

coveralls commented May 14, 2017

Coverage Status

Coverage increased (+0.1%) to 82.139% when pulling 7013184 on fnothaft:issues/1497-union into 18191f9 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2005/
Test PASSed.

val union = features1.union(features2)
assert(union.rdd.count === (features1.rdd.count + features2.rdd.count))
// only a single contig between the two
assert(union.sequences.size === 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implies there is a distinct here . . .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a filter by contig name when merging two SequenceDictionarys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification

val genotype2 = sc.loadGenotypes(testFile("small.vcf"))
val union = genotype1.union(genotype2)
assert(union.rdd.count === (genotype1.rdd.count + genotype2.rdd.count))
assert(union.sequences.size === (genotype1.sequences.size + genotype2.sequences.size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but not here. Does (_ ++ _) collapse elements that are equals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is everyone's jolly favorite bioinformatics boondoggle: chr prefixes vs no prefixes. Thus, the contigs are distinct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facepalm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I know. However, works for our test purposes.

@heuermh heuermh merged commit 37b971a into bigdatagenomics:master May 17, 2017
@heuermh
Copy link
Member

heuermh commented May 17, 2017

Thank you, @fnothaft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants