Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bismark: fixed bug for ambiguous alignments #58

Closed
wants to merge 1 commit into from

Conversation

jsh58
Copy link

@jsh58 jsh58 commented Jul 20, 2016

Hi Felix,

Your changes didn't quite do the trick. This commit, as far as I can tell, handles the ambiguous alignments correctly. It also produces a complete ambig.bam file -- previously (even with v0.15.0) some alignments were left out of this file. Tested on single-end reads.

Cheers,

John Gaspar

@@ -3050,6 +3049,9 @@ sub check_bowtie_results_single_end_bowtie2{
# warn "$first_ambig_alignment\n"; sleep(1);
}
}
elsif ($alignment_score == $best_AS_so_far) {
$amb_same_thread = 1;
}
Copy link
Owner

@FelixKrueger FelixKrueger Jul 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way to go about this because if this alignment has the same alignment score as the best AS so far then this is not ambiguous in the same thread. To be ambiguous within the same thread the AS and the second best hit have to have the same score.
If overwrite is specified the alignment will be added to an alignment data structure further down, and in a later step all alignments in that data structure will be sorted by their lowest AS. If two or more different locations have the same lowest AS the read is booted altogether.

@FelixKrueger
Copy link
Owner

Have you tried the latest version and did ambiguous reads get booted that previously stayed in? I have tried it with the one read you supplied and there it seems to have worked just fine. Best, Felix

@jsh58
Copy link
Author

jsh58 commented Jul 21, 2016

Felix,

I did try your latest version. It did better with some reads, but it failed for all reads that had AS1 == AS2 and (AS1 > XS1 or AS2 > XS2).

This is because now when $alignment_score == $best_AS_so_far (i.e. AS1 == AS2), $amb_same_thread is reset to 0. The alignments are not booted altogether later; the second alignment is reported as unambiguous.

It is not clear to me why if AS1 == AS2, you wouldn't want to abort (line 3188 -- only entered if $amb_same_thread == 1), the same as if AS1 == XS1 == best or AS2 == XS2 == best. For clarity/consistency, I/you could define a new $amb_diff_thread variable that is separate from $amb_same_thread but performs the same function in cases where AS1 == AS2.

John Gaspar

@FelixKrueger
Copy link
Owner

Would you be able to supply one of the reads that fail your test (AS1 == AS2 and (AS1 > XS1 or AS2 > XS2)?

@jsh58
Copy link
Author

jsh58 commented Jul 21, 2016

Felix,

Here are the first 3 reads that fail:

// indiv. alignments:
NS500532:74:HMLCNBGXX:1:11101:20712:1056 chr4_CT_converted 146108733 AS:i:0 XS:i:0
NS500532:74:HMLCNBGXX:1:11101:20712:1056 chr4_GA_converted 147032712 AS:i:0
// bismark v0.16.2 alignment:
NS500532:74:HMLCNBGXX:1:11101:20712:1056 16 chr4 147032712 42 76M * 0 0 AAAAACATATCATAAATAATTAATATCAAAAACTCTATGACACATAATAAATCTAAAAATAACCTAACAAAATCTC AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAAAAA

// indiv. alignments:
NS500532:74:HMLCNBGXX:1:11101:13832:1374 chr2_CT_converted 151318783 AS:i:0 XS:i:0
NS500532:74:HMLCNBGXX:1:11101:13832:1374 chr8_GA_converted 20022085 AS:i:0 XS:i:-6
// bismark v0.16.2 alignment:
NS500532:74:HMLCNBGXX:1:11101:13832:1374 16 chr8 20022085 32 76M * 0 0 AAAACCCACAAAAACCATTTAAAACCTAACACATTACAAAAAAACAATAAAAAAAACCCTATCTACTTTTAATACT AE<EEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA

// indiv. alignments:
NS500532:74:HMLCNBGXX:1:11101:6786:1458 chr8_CT_converted 19924629 AS:i:-6 XS:i:-6
NS500532:74:HMLCNBGXX:1:11101:6786:1458 chr8_GA_converted 19759688 AS:i:-6
// bismark v0.16.2 alignment:
NS500532:74:HMLCNBGXX:1:11101:6786:1458 16 chr8 19759688 24 76M * 0 0 CCACCTATACTGCTATCAATAAAACTCCTACTAATATTAAACCAACTATTTCTTCCCAATATACTCCTAACTATAA EEEEEEEE<EEEEA/AEEE/EEEEE/EEEEEEEEAEEEA<EEEEAEEAEEAEAEEEEAEE/EEEEEEEEA/AAAAA

With my update to Bismark, all such reads are unaligned, except in the *ambig.bam file.

John Gaspar

@FelixKrueger
Copy link
Owner

Hi John,

Many thanks for these sequences, they do indeed post a new corner case which the last 'fix' introduced... I have now changed it again so that the amb_within_thread variable only gets reset if the new alignment is actually better than the previous one. Can you please see it this fixes it? Changed here cde170c and 55dea1d.

@jsh58
Copy link
Author

jsh58 commented Jul 22, 2016

Felix,

The short answer is no, it doesn't fix it.

John Gaspar

@FelixKrueger
Copy link
Owner

FelixKrueger commented Jul 22, 2016

Interesting, in what way? I am getting the following report:

Final Alignment report
======================
Sequences analysed in total:    3
Number of alignments with a unique best hit from the different alignments:      0
Mapping efficiency:     0.0%
Sequences with no alignments under any condition:       0
Sequences did not map uniquely: 3
Sequences which were discarded because genomic sequence could not be extracted: 0

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT:  0       ((converted) top strand)
CT/GA:  0       ((converted) bottom strand)
GA/CT:  0       (complementary to (converted) top strand)
GA/GA:  0       (complementary to (converted) bottom strand)

and they are all present in the .ambig.bam file.

@jsh58
Copy link
Author

jsh58 commented Jul 22, 2016

Felix,

Before I close the pull request, please go back and re-read my previous comments.

I'd rather not continue this cycle of testing and diagnosing your "fixes", especially since I already spent a great deal of time debugging the code, producing an actual fix, and testing and verifying it on 83.5 million reads (not just 1 or 3 reads).

John Gaspar

@FelixKrueger
Copy link
Owner

Hi John,

First of all I am very grateful for your help and willingness to improve the Bismark code. I have re-read your comments and have tried to implement a solution that is correct and does the right thing.

This is because now when $alignment_score == $best_AS_so_far (i.e. AS1 == AS2), $amb_same_thread is reset to 0. The alignments are not booted altogether later; the second alignment is reported as unambiguous.
This has been addressed and is no longer happening.

It is not clear to me why if AS1 == AS2, you wouldn't want to abort (line 3188 -- only entered if $amb_same_thread == 1), the same as if AS1 == XS1 == best or AS2 == XS2 == best. For clarity/consistency, I/you could define a new $amb_diff_thread variable that is separate from $amb_same_thread but performs the same function in cases where AS1 == AS2.

This is happening a few lines further down in the code already: If there is only a single reported alignment it will be used as Unique alignment. If there are two or more alignments they are sorted by AS and if they have the same AS the read will be booted. This scenario would be the $amb_diff_thread, it does happens already even if it might not be known by this variable name.

I have tried to faithfully address the bug you have outlined to me which seems to work just fine on the offending reads you supplied, but "the short answer is no" isn't exactly helpful if there isn't a slightly longer one as well. Best wishes, Felix

@jsh58
Copy link
Author

jsh58 commented Jul 23, 2016

Felix,

Sorry to say I can't assist further. This has been very time consuming, and it seems we have profoundly different notions about software debugging and testing.

Good luck,

John Gaspar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants