Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM strand suffix optimization #51

Open
qiyunzhu opened this issue Jun 21, 2020 · 0 comments
Open

SAM strand suffix optimization #51

qiyunzhu opened this issue Jun 21, 2020 · 0 comments

Comments

@qiyunzhu
Copy link
Owner

qiyunzhu commented Jun 21, 2020

A SAM file generated from paired-end sequences will have strand suffix (/1 for forward strand, /2 for reverse strand) trimmed (thus the query names are identical), but the strand information will be embeded in the flag (2nd column), an integer in which the 7th bit represents the forward read and the 8th bit represents the reverse read.

For example, flag = 99 (0b1100011) is the forward read, flag = 147 (0b10010011) is the reverse read, flag = 16 (0b10000) is an unpaired sequence.

This information needs to be appended to the query name. Previously, the solution was:

if flag & (1 << 6):
    qname += '/1'
elif flag & (1 << 7):
    qname += '/2'

I tested several solutions. There is an elegant solution, which performs bit operation once and get the correct strand information for both:

if strand := flag >> 6 & 0b11:
    qname = f'{qname}/{strand}'

The variable strand will have three values: 0 for unpaired, 1 for forward, and 2 for reverse.

However, the expensive part is the subsequent string concatenation. I couldn't find a way that is more efficient than the original solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant