Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxVariantFrac #123

Open
Dingersrun opened this issue Sep 23, 2021 · 3 comments
Open

maxVariantFrac #123

Dingersrun opened this issue Sep 23, 2021 · 3 comments

Comments

@Dingersrun
Copy link

I wanted to filter out all the possible SNPs, both homozygous and heterozygous. My understanding is that if no SNPs, no non-G should be expected, and thus I set this to 0, then almost all the CpG sites were excluded (22Million out of 23 Million were excluded). When I set it to 0.1, most of the CpG sites are retained. Is this filtering so harsh? Do you have any suggestions about filtering the SNPs?
--maxVariantFrac means the fraction of Non-G on the opposite strand of C compared with the coverage at this given base or only the coverage of the opposite strand of C? For instance, 10 reads from the C strand and 10 reads from the non-C strand, there are 3 non-G reads from the non-C strand, then the variant fraction here is 0.3 or 0.15?
Thanks a lot :)

@dpryan79
Copy link
Owner

0.1 is fairly reasonable. Please note that you end up filtering out any sequence artifacts and stuff like that, which will randomly appear with longer reads.

@Dingersrun
Copy link
Author

Thanks for the reply! My other question is which count, the count of total reads at this base or count of reads on the non-C strand, is the fraction of non-G reads in the non-C strands compared against?

@dpryan79
Copy link
Owner

It's the count on the non-C strand, since it's easier to assess whether there's a variant using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants