-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
queryMate method throws exception for CRAM records with the same read name #1065
Comments
Yes, it was created with samtools 1.6 command line tool (i.e. the C implementation).... samtools view -T reference.fa -C -o out.cram in.sorted.bam |
Hi Chris, when do you think this issue and #802 may get addressed? |
@kpepper It probably will depend a lot on what the problem turns out to be. One thing that would definitely speed things up is to have (preferably minimally small) test files that reproduce the problem (bam, cram and .fasta). |
He's some test files. I'm now using this via the latest Picard jar, which was recently built with the up to date htsjdk. It's for Artemis development. |
@kpepper Thanks for the files, that will help a lot. I'll take a look. |
@cmnbroad Thanks. |
@kpepper Update: I'm glad you reported this issue - it looks like its due to a bug in the CRAM implementation of queryAlignmentStart, which is used by queryMate. Calling queryAlignmentStart("Mito", 631), for example, returns 24 reads on the BAM, but ~22788 reads on the CRAM, which I think represents the entire rest of the file. I don't yet have a fix, but wanted to update the ticket. One other random observation. ValidateSamFile reports a bunch of mate-pair errors for the BAM, but not the CRAM (though it does report some other issues with NM). If I use htsjdk to convert the BAM to CRAM, ValidateSamFile reports exactly the same mate pair issues on the CRAM as it does on the original BAM. One question - I'd like to be able to use this BAM as a test file in htsjdk once I have a fix, but it would live in the public repository. Let me know if that would be acceptable. |
@cmnbroad Thanks for the update. That explains the apparent BAM/CRAM speed difference then. I'll get back to you about the test data. |
@cmnbroad Yes, it's okay to use that file for test data on GitHub. |
Hi @cmnbroad, just wondered how the fix is going for this issue(s)? |
Hi @kpepper. I haven' been able to spend any time on this recently but haven't forgotten about it. I will try to get to it in the next week or so. |
@kpepper Update: Sorry for the long delay. I've finally gotten around to this and understand the underlying issue and have a proposed fix. I still need to add some new tests since it required a little bit of refactoring. |
@cmnbroad no problem, look forward to trying the fix out when ready. |
Has this been fixed now or still in progress? |
@kpepper Still in progress - I expect to submit a pull request with a fix within the next week. |
@kpepper I submitted a PR with a fix for the CRAM implementation of queryAlignmentStart/queryMate. Sorry for the long delay. I'm not sure how long it will take to get this through code review. In the meantime, if you're so inclined, feel free to pull the branch and try it out. The behavior wrt/validations being triggered via queryMate should be the same now between BAM and CRAM. |
@cmnbroad - Thanks. I use htsjdk via Picard so will probably need to wait for the fix to be pulled in. |
@kpepper you may have lost interest by now (sorry for the insanely long turnaround time on this), but FYI #1164 is merged now, which fixes the queryMate/queryAlignmentStart issue that, IIRC, was the underlying problem that resulted in the original issue reported here. I know you need it in Picard but hopefully this will make its way there in the near term. Also thanks for the test data. |
Subject of the issue
Occasionally I am getting the following exception when calling the SamReader queryMate method for a CRAM file…..
htsjdk.samtools.SAMFormatException: Multiple SAMRecord with read name IL29_4505:7:93:5878:1091#2 for first end.
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryMate(SamReader.java:451)
I don’t see the same thing for the equivalent BAM file.
Your environment
Steps to reproduce
e.g. the CRAM file has the following reads, that seem to be causing the issue:
IL29_4505:7:93:5878:1091#2 163 Mito 267 60 108M = 631 399 TTAATTAATTAATAATAAAAATATAATTATAAATAATATAAATATTATTCTTTATTAATAAATATATATTTATATATTATAAAAGTATCTTAATTAATAAAAATAAAC BBBBBBB@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB>BBBBBBBBBBB=BBBBBBBBB@?9A=A:/8 AS:i:108 XS:i:0 MD:Z:108 NM:i:0 RG:Z:lane1
IL29_4505:7:93:5878:1091#2 83 Mito 631 60 73S35M = 267 -399 AATGAAAATTTTATAAAAAAAGTAATGAATATTCCTTTTTAAAAATACCTTCTTAATTAAATAAAATAAAATAAAAAGGGGACTTTATATTTATAAAGTAATTATATT <=8=;BCBBABBBBBBBBBC?BDDDBBBBB0DDBBBBBCBBBBBBAAABB=CCCCCCBBBBBBBBBBBBBBBBBBDBBBDDCBBBBBBBBBBBBBDDDBDBBBBBDCD AS:i:35 XS:i:0 SA:Z:Mito,44163,+,37S39M32S,35,1; MD:Z:35 NM:i:0 RG:Z:lane1
IL29_4505:7:93:5878:1091#2 323 Mito 44163 35 37H39M32H = 267 -43897 TTTTATTTTATTTAATTAAGAAGGTATTTTTAAAAAGGA BBBBBBBBBBBBBBCCCCCC=BBAAABBBBBBCBBBBBD AS:i:34 XS:i:27 SA:Z:Mito,631,-,73S35M,60,0; MD:Z:30A8 NM:i:1 RG:Z:lane1
Similarly for the equivalent BAM file (which does not seem to cause an issue):
IL29_4505:7:93:5878:1091#2 163 Mito 267 60 108M = 631 399 TTAATTAATTAATAATAAAAATATAATTATAAATAATATAAATATTATTCTTTATTAATAAATATATATTTATATATTATAAAAGTATCTTAATTAATAAAAATAAAC BBBBBBB@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB>BBBBBBBBBBB=BBBBBBBBB@?9A=A:/8 NM:i:0 MD:Z:108 AS:i:108 XS:i:0 RG:Z:lane1
IL29_4505:7:93:5878:1091#2 83 Mito 631 60 73S35M = 267 -399 AATGAAAATTTTATAAAAAAAGTAATGAATATTCCTTTTTAAAAATACCTTCTTAATTAAATAAAATAAAATAAAAAGGGGACTTTATATTTATAAAGTAATTATATT <=8=;BCBBABBBBBBBBBC?BDDDBBBBB0DDBBBBBCBBBBBBAAABB=CCCCCCBBBBBBBBBBBBBBBBBBDBBBDDCBBBBBBBBBBBBBDDDBDBBBBBDCD NM:i:0 MD:Z:35 AS:i:35 XS:i:0 RG:Z:lane1 SA:Z:Mito,44163,+,37S39M32S,35,1;
IL29_4505:7:93:5878:1091#2 323 Mito 44163 35 37H39M32H = 267 -43897 TTTTATTTTATTTAATTAAGAAGGTATTTTTAAAAAGGA BBBBBBBBBBBBBBCCCCCC=BBAAABBBBBBCBBBBBD NM:i:1 MD:Z:30A8 AS:i:34 XS:i:27 RG:Z:lane1 SA:Z:Mito,631,-,73S35M,60,0;
Each read has the same name but a different flag value (one is a non-primary alignment).
The exception originates from this piece of code in the queryMate() method……
try {
Anybody able to explain the difference in behaviour between the BAM and CRAM?
Is this a bug or expected in some way?
The text was updated successfully, but these errors were encountered: