Adding utilities for read trimming. #248

fnothaft · 2014-05-28T20:25:06Z

This pull request adds a few utilities for read trimming, as a first start towards read error correction. This pull request contains:

An update to the ADAMRecord schema to add fields for bases trimmed from the read start/end. After alignment, these fields are redundant (with hard clipping in the CIGAR), but the fields are important for trimming performed before alignment. These fields are added in the SAMRecordConverter.
Two new RDD functions. One performs a fixed length trim to all reads. A second function trims the prefix/suffix of all reads in a read group based on the geometric mean of the error probability at that position.
An update to the Transform CLI module, to add these two functions as possible transforms.
A correction to the CycleCovariate that correctly calculates the sequencing cycle for hard clipped reads.

AmplabJenkins · 2014-05-28T20:37:49Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/327/

massie · 2014-05-28T21:05:12Z

adam-format/src/main/resources/avro/adam.avdl

@@ -197,7 +199,7 @@ record ADAMGenotype {
  union { null, VariantCallingAnnotations } variantCallingAnnotations = null;

  // Sample-level data, i.e. data specific to this particular sample
-  union { null, string }  sampleId = null;
+  string sampleId;


Shouldn't this be an optional field?

Hmmm, something went screwy with the rebase here. Let me fix this...

AmplabJenkins · 2014-05-28T21:52:48Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/328/

massie · 2014-05-28T23:45:14Z

adam-core/src/test/scala/org/bdgenomics/adam/rdd/correction/TrimReadsSuite.scala

+    // we should trim the first and last 5 bases off all reads
+    trimmed.collect.foreach(r => {
+      assert(r.getBasesTrimmedFromStart === 5)
+      assert(r.getBasesTrimmedFromStart === 5)


Do you mean getBasesTrimmedFromEnd here?

Yes, yes, I do... I'll fix this.

tdanford · 2014-05-29T12:21:34Z

Are we still maintaining updates to the CHANGES file(s) as part of a pull request?

fnothaft · 2014-05-29T14:16:16Z

@tdanford I believe we've dropped the CHANGES.txt, and are just keeping the CHANGES.md. However, CHANGES.md should probably be updated via the script whenever we do a release.

AmplabJenkins · 2014-05-29T17:02:51Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/332/

tdanford · 2014-05-29T17:26:19Z

Frank, I think you've got some rebasing to do -- it looks like there are some commits in this branch that are duplicated from master, right?

fnothaft · 2014-05-29T17:27:25Z

@tdanford correct; not sure how that happened...

fnothaft · 2014-05-29T17:28:39Z

@tdanford this is rebased now, thanks for the heads up!

tdanford · 2014-05-29T17:29:43Z

Again! I just merged Matt's edit to CONTRIBUTING.md, you're still out of date! :-)

fnothaft · 2014-05-29T17:30:59Z

@tdanford u r rebase troll :'(

Re-rebased...

tdanford · 2014-05-29T17:32:09Z

I like my commit histories like I like my spaces: linear.

tdanford · 2014-05-29T17:33:49Z

Okay, a more substantive comment: Frank, do you have somewhere (comments? a change file?) where you can write a little bit about the motivation behind these trimming additions? Are you trying to recreate specific functionality from Picard (I'm thinking the MergeBamAlignments command, but maybe something else)? From somewhere else? Does this satisfy a requirement or need for avocado or variant calling in particular? I understand there are a lot of reasons why one might "trim" reads, I'm just hoping for a little context here.

fnothaft · 2014-05-29T17:37:18Z

@tdanford Good point; I don't have this documented yet. When I started on this, I was planning to implement read error correction first (à la Quake), but I then decided that read trimming was a better first start. The need is indeed for variant calling/assembly.

AmplabJenkins · 2014-05-29T17:46:21Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/336/

AmplabJenkins · 2014-05-29T17:55:07Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/337/

Adding utilities for read trimming.

massie · 2014-05-29T23:21:00Z

Thanks, Frank!

massie reviewed May 28, 2014
View reviewed changes

Adding utilities for read trimming.

3f3cbd3

massie added a commit that referenced this pull request May 29, 2014

Merge pull request #248 from fnothaft/read-trimming

abe6834

Adding utilities for read trimming.

massie merged commit abe6834 into bigdatagenomics:master May 29, 2014

fnothaft deleted the read-trimming branch July 10, 2014 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding utilities for read trimming. #248

Adding utilities for read trimming. #248

fnothaft commented May 28, 2014

AmplabJenkins commented May 28, 2014

massie May 28, 2014

fnothaft May 28, 2014

AmplabJenkins commented May 28, 2014

massie May 28, 2014

fnothaft May 28, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

AmplabJenkins commented May 29, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

fnothaft commented May 29, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

tdanford commented May 29, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

massie commented May 29, 2014

Adding utilities for read trimming. #248

Adding utilities for read trimming. #248

Conversation

fnothaft commented May 28, 2014

AmplabJenkins commented May 28, 2014

massie May 28, 2014

Choose a reason for hiding this comment

fnothaft May 28, 2014

Choose a reason for hiding this comment

AmplabJenkins commented May 28, 2014

massie May 28, 2014

Choose a reason for hiding this comment

fnothaft May 28, 2014

Choose a reason for hiding this comment

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

AmplabJenkins commented May 29, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

fnothaft commented May 29, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

tdanford commented May 29, 2014

tdanford commented May 29, 2014

fnothaft commented May 29, 2014

AmplabJenkins commented May 29, 2014

AmplabJenkins commented May 29, 2014

massie commented May 29, 2014