Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-946] Fixes to FlagStat for Samtools concordance issue #954

Closed
wants to merge 3 commits into from

Conversation

jpdna
Copy link
Member

@jpdna jpdna commented Feb 21, 2016

These changes result in perfect concordance between ADAM and Samtools flagstat output stats, addressing corner cases around secondary, supplemental, and unmapped reads as described in #946

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@fnothaft
Copy link
Member

Jenkins, add to whitelist.

@fnothaft
Copy link
Member

Stellar! Thanks for sorting this out, @jpdna! Interesting to see that it is the "unmapped, yet have alignment info" reads that are causing the trouble. I've always been unsure as to what is the correct way to handle them.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1092/

Build result: FAILURE

GitHub pull request #954 of commit 6a497b9 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/954/merge^{commit} # timeout=10 > git branch -a --contains 6792655 # timeout=10 > git rev-parse remotes/origin/pr/954/merge^{commit} # timeout=10Checking out Revision 6792655 (origin/pr/954/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 6792655a74f389324d18134be2bca4b8a5f773bfFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@jpdna
Copy link
Member Author

jpdna commented Feb 21, 2016

oops, I see now this doesn't pass the tests.
Among other things it seems: " Unaligned read can't be a primary alignment" while constructing DecadentRead " I'll investigate ways to make flagstat match with Samtools while still not violating these assumptions.

@jpdna jpdna closed this Feb 21, 2016
@fnothaft
Copy link
Member

Among other things it seems: " Unaligned read can't be a primary alignment" while constructing DecadentRead

I'd need to read the source to be 100%, but I think DecadentRead is a bit over aggressive with input checking and would just remove that requirement.

// the statistics output by other programs, specifically Samtools Flagstat

if (samRecord.getReadUnmappedFlag) {
builder.setReadMapped(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to...builder.setReadMapped(!samRecord.getReadUnmappedFlag) .. without needing a conditional.

@jpdna jpdna reopened this Feb 24, 2016
@@ -249,7 +249,7 @@ class AlignmentRecordConverterSuite extends FunSuite {
.split('\n')

assert(!secondRecord.getReadMapped)
assert(!secondRecord.getReadNegativeStrand)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the input data, this seems likely a bug in the test that was previously covered up by fact that for unmapped reads the code previously did not set this flag even if present in input.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1094/
Test PASSed.

@jpdna
Copy link
Member Author

jpdna commented Feb 24, 2016

I've pushed changes to address flag setting code simplification @massie suggests above and also modified tests in a way that the code now passes the tests. The tests modifications seem to remove a bug or unnecessary requirements, but should be reviewed by others to make sure no downstream assumptions are violated.

@@ -78,7 +78,7 @@ private[adam] object DecadentRead extends Logging with Serializable {
@deprecated("Use RichAlignmentRecord wherever possible in new development.", since = "0.18.0")
private[adam] class DecadentRead(val record: RichAlignmentRecord) extends Logging {
// Can't be a primary alignment unless it has been aligned
require(!record.getPrimaryAlignment || record.getReadMapped, "Unaligned read can't be a primary alignment")
//require(!record.getPrimaryAlignment || record.getReadMapped, "Unaligned read can't be a primary alignment")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider logging a warning message or dropping this line altogether instead of commenting out the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider logging a warning message or dropping this line altogether instead of commenting out the code.

Done. - removed the require line that had been commented-out

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1095/
Test PASSed.

// are unclear when the read is not mapped or reference is not defined,
// it is nonetheless favorable to set these flags in the ADAM file
// in same way as they appear in the input BAM inorder to match exactly
// the statistics output by other programs, specifically Samtools Flagstat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this

@heuermh
Copy link
Member

heuermh commented Feb 24, 2016

LGTM

@heuermh heuermh added this to the 0.19.0 milestone Feb 24, 2016
@heuermh heuermh changed the title Fixes to FlagStat for Samtools concordance issue #946 [ADAM-946] Fixes to FlagStat for Samtools concordance issue Feb 24, 2016
@heuermh
Copy link
Member

heuermh commented Feb 24, 2016

Fixes #946

@heuermh
Copy link
Member

heuermh commented Feb 24, 2016

Thank you, @jpdna. Merged manually.

@heuermh heuermh closed this Feb 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants