Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can ASCIIGenome filter reads having indels on a specific genomic region? #100

Open
YingziZhang-github opened this issue Mar 1, 2023 · 3 comments

Comments

@YingziZhang-github
Copy link

Hi,

Thank you very much for developing ASCIIGenome.
I was using the following command to extract reads in test.bam containing variants on chr11:2699098-2699099. It gave the output successfully.

ASCIIGenome.jar \
-fa genome.fa \
test.bam

goto chr11:2699098-2699099
filterVariantReads -r 2699098+1
print > test.chr11:2699098-2699099.SV.txt

Now, I only need reads having indels on chr11:2699098-2699099. And I don't want to have reads only having SNVs on chr11:2699098-2699099. Can ASCIIGenome do this?

Looking forward to your reply.

Many thanks,
Yingzi

@dariober
Copy link
Owner

dariober commented Mar 1, 2023

Hi Yingzi- Thanks for your interest in ASCIIgenome.

Now, I only need reads having indels on chr11:2699098-2699099. And I don't want to have reads only having SNVs on chr11:2699098-2699099. Can ASCIIGenome do this?

From top of my head, before printing you could filter reads containing the D or I operator in the CIGAR string indicating that the read has an INDEL somewhere. So something like:

goto chr11:2699098-2699099
filterVariantReads -r 2699098+1
awk '$CIGAR ~ "I" || $CIGAR ~ "D"'
print > test.chr11:2699098-2699099.SV.txt

This should keep reads having a variant at 2699098+1 AND and indel somewhere else, not necessarily at 2699098+1, though. See if this gets you closer to your needs.

@YingziZhang-github
Copy link
Author

YingziZhang-github commented Mar 1, 2023

Hi Dario,

I tried

goto chr11:2699098-2699099
filterVariantReads -r 2699098+1
awk '$CIGAR ~ "I" || $CIGAR ~ "D"'
print > test.chr11:2699098-2699099.SV.txt

The awk '$CIGAR ~ "I" || $CIGAR ~ "D"' seems not working. The output does not change. I see the view is

test.bam@2; Reads: 296; filters: [awk, var-read]
..
g,
--
..
--
--
GA
..
AG
26
0
chr11:2699098-2699099; 2 bp; 1.0 bp/char

Should the command deal with "-" and "." instead?

Thank you very much. Looking forward to your reply.

Best,

@YingziZhang-github
Copy link
Author

This should keep reads having a variant at 2699098+1 AND and indel somewhere else, not necessarily at 2699098+1, though.

To clarify, I need to keep reads having indels at chr11:2699098 and/or chr11:2699099 (not necessarily 2bp indels, can be larger). But I don't want to keep reads only having SNVs but no indels on these two base positions. Sorry if I made it unclear.

Looking forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants