-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145
[ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145
Conversation
Test FAILed. Build result: FAILUREGitHub pull request #1145 of commit e18db5a automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1145/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains f5465f8 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1145/merge^{commit} # timeout=10Checking out Revision f5465f8 (origin/pr/1145/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f f5465f81f3d35a033c626b652ccd157936e8f3d0First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
* @param asSingleFile If true, saves output as a single file. | ||
* @param isSorted If the output is sorted, this will modify the header. | ||
*/ | ||
def saveAsSam( | ||
filePath: String, | ||
asSam: Boolean = true, | ||
asType: Option[SAMFormat] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a generic term for these formats? Otherwise I think the optional asType
is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, alas, the enum is just SAM, BAM, and CRAM: https://github.com/HadoopGenomics/Hadoop-BAM/blob/master/src/main/java/org/seqdoop/hadoop_bam/SAMFormat.java#L32
e18db5a
to
37a8292
Compare
Test PASSed. |
@fnothaft did you want to take off the -1 here? What is here LGTM, though I would like to see some CRAM-specific unit tests and a small CRAM test file to read. |
Nah this is still -1 pending the CRAM specific tests. |
37a8292
to
b2d40c7
Compare
OK! Removing my -1 here. I've added a commit (b2d40c7) with CRAM specific tests. Can I get a review pass of said commit? Once it looks good to everyone, I'll squash down and we can merge this. |
Test PASSed. |
val readsA = rddA.rdd.collect() | ||
val readsB = rddB.rdd.collect() | ||
|
||
readsA.indices.foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... this may be a more robust way to validate than the zip
I've been using (with various problems) in FeatureRDDSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two are equivalent, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the zip fails inconsistently for me with SparkException: Can only zip RDDs with same number of elements in each partition
. Sometimes less clever is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I typically do a collect before the zip, which eliminates said issue (and we need to collect to use asserts anyways).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;)
LGTM |
b2d40c7
to
0b7e03e
Compare
…RAM. Resolves bigdatagenomics#1141. Changes the signature of `AlignmentRecordRDD.saveAsSAM` to take an `Option[SAMFormat]` parameter, since `asSam` is now no longer a binary choice.
Test PASSed. |
Resolves #1141. Changes the signature of
AlignmentRecordRDD.saveAsSAM
to take anOption[SAMFormat]
parameter, sinceasSam
is now no longer a binary choice.-1 for now, as I need to make a pass back and write some more tests. Depends on #1104, #1117.