-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAM-646] Special case reads with '*' quality during BQSR. #647
Conversation
This looks good, but should we just set |
Are you suggesting to do that check in the SAM/BAM<->ADAM converters? |
Test FAILed. Build result: FAILUREGitHub pull request #647 of commit f6ce721 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/647/merge^{commit} # timeout=10 > git branch -a --contains d7e55c115cfc9f4de7289144d2506ea006bf3237 # timeout=10 > git rev-parse remotes/origin/pr/647/merge^{commit} # timeout=10Checking out Revision d7e55c115cfc9f4de7289144d2506ea006bf3237 (origin/pr/647/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f d7e55c115cfc9f4de7289144d2506ea006bf3237First time build. Skipping changelog.Triggering ADAM-prb ? 2.2.0,centosTriggering ADAM-prb ? 2.3.0,centosTriggering ADAM-prb ? 1.0.4,centosADAM-prb ? 2.2.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,centos completed with result FAILUREADAM-prb ? 1.0.4,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Yes, when we convert from BAM to ADAM, set simply set the |
That's a good idea. Let me refactor that. |
f6ce721
to
608d5f7
Compare
Updated with the |
Test FAILed. Build result: FAILUREGitHub pull request #647 of commit 608d5f7 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/647/merge^{commit} # timeout=10 > git branch -a --contains 356a6d6711a5a558e7df6c94cd0b427d82a69d58 # timeout=10 > git rev-parse remotes/origin/pr/647/merge^{commit} # timeout=10Checking out Revision 356a6d6711a5a558e7df6c94cd0b427d82a69d58 (origin/pr/647/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 356a6d6711a5a558e7df6c94cd0b427d82a69d58First time build. Skipping changelog.Triggering ADAM-prb ? 2.2.0,centosTriggering ADAM-prb ? 2.3.0,centosTriggering ADAM-prb ? 1.0.4,centosADAM-prb ? 2.2.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,centos completed with result FAILUREADAM-prb ? 1.0.4,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. Looks like some issue with a JAR not being pulled down in the Hadoop 2.2 build. |
@@ -79,7 +79,7 @@ class AlignmentRecordConverter extends Serializable { | |||
// set canonically necessary fields | |||
builder.setReadName(adamRecord.getReadName.toString) | |||
builder.setReadString(adamRecord.getSequence) | |||
builder.setBaseQualityString(adamRecord.getQual) | |||
Option(adamRecord.getQual).fold(builder.setBaseQualityString("*"))(s => builder.setBaseQualityString(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple pattern matching here would prevent double-setting the qualityString (unless I'm misreading this) and prevent allocating objects we won't use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Other than one nit, this looks good to me. |
608d5f7
to
4e30187
Compare
Test PASSed. |
[ADAM-646] Special case reads with '*' quality during BQSR.
Thanks, Frank! |
Thanks, Frank! |
Great! Glad to hear it @Jaeki! |
@fnothaft Could you check the followin case? 15/04/11 12:35:38 INFO DAGScheduler: Job 1 failed: aggregate at BaseQualityRecalibration.scala:84, took 3.648340 s |
The exact line of the error is shown in this stack trace.
Your sequence is 102 bases long... $ echo "AAATGGAACGAAGTGGAATCGAGTGGAATGGAATCGAATGGAGTGAAATGGAATGGAATGGACGCGAAAGAATGGACTGGAACAAAATGAAATCGAACGGT" | wc -c
102 ... but the difference between your reference start and end position is The sequence is missing 3 bases. |
@massie Thank you for your comment. Do you think the missing 3 bases come from the SNAP tool ? I downloaded the NA12878 reads from web and aligned with SNAP, transformed the sam file to adam and then BQSR with ADAM. That's what I did. |
Resolves #646. Allows the creation of
DecadentRead
s with*
quality scores. These reads are then not observed or corrected during BQSR.