Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected base in duplex call #46

Open
dpaudel-tb opened this issue Apr 27, 2023 · 4 comments
Open

Unexpected base in duplex call #46

dpaudel-tb opened this issue Apr 27, 2023 · 4 comments

Comments

@dpaudel-tb
Copy link

Hello,
I extracted fastq from the duplex_orig.sam and compared output to the original raw reads.
In the following alignment file, The top sequence is the output for the duplex read. Below it are the two corresponding reads. Last read is reverse complemented.
The highlighted region shows 'A' in the duplex output while it shows 'C' in read1 and 'T' in reverse complement of read2.
My understanding is, for this locus, the duplex should show either C or T but not A.
Can you please share some insights?
Thanks
-Dev
unknown_snp_duplex

@HenrivdGeest
Copy link

Could it be an alignment issue in this view? The AT from the duplex fits with read2 revcomp a bit to the right. Without seeing more bases to the right, it seems here that the duplex read is missing 2 bases.

@dpaudel-tb
Copy link
Author

Yes, here are more sequences to the right:
unknown_snp_duplex_longer

@cjw85
Copy link
Member

cjw85 commented Apr 27, 2023

Hi @dpaudel-tb,

The assumption that there is a trivial relationship between the simplex calls and the duplex calls is incorrect. The duplex caller is not formed trivially from the simplex calls.

Consider the inference of decoding a single simplex signal into a basecall. I can find the most likely basecall that explains the observed signal. I can do this for the second stand signal too. Those inference problems are independent (at least they are treated as such -- they are not informed by each other).

A duplex caller however is attempting to find a single basecall that explains both observed signals simultaneously. Although clearly in the limit of complete information all three calls would be identical, when variation is taken into account the calls can differ.

@dpaudel-tb
Copy link
Author

Thank you @cjw85 for your insights. Unfortunately, I do not have 'ground truth' data for these sequences so I am also not sure what the actual bases should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants