maxVariantFrac #123

Dingersrun · 2021-09-23T19:30:29Z

I wanted to filter out all the possible SNPs, both homozygous and heterozygous. My understanding is that if no SNPs, no non-G should be expected, and thus I set this to 0, then almost all the CpG sites were excluded (22Million out of 23 Million were excluded). When I set it to 0.1, most of the CpG sites are retained. Is this filtering so harsh? Do you have any suggestions about filtering the SNPs?
--maxVariantFrac means the fraction of Non-G on the opposite strand of C compared with the coverage at this given base or only the coverage of the opposite strand of C? For instance, 10 reads from the C strand and 10 reads from the non-C strand, there are 3 non-G reads from the non-C strand, then the variant fraction here is 0.3 or 0.15?
Thanks a lot :)

dpryan79 · 2021-09-27T08:25:00Z

0.1 is fairly reasonable. Please note that you end up filtering out any sequence artifacts and stuff like that, which will randomly appear with longer reads.

Dingersrun · 2021-09-27T08:35:23Z

Thanks for the reply! My other question is which count, the count of total reads at this base or count of reads on the non-C strand, is the fraction of non-G reads in the non-C strands compared against?

dpryan79 · 2021-09-28T11:58:04Z

It's the count on the non-C strand, since it's easier to assess whether there's a variant using it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxVariantFrac #123

maxVariantFrac #123

Dingersrun commented Sep 23, 2021

dpryan79 commented Sep 27, 2021

Dingersrun commented Sep 27, 2021

dpryan79 commented Sep 28, 2021

maxVariantFrac #123

maxVariantFrac #123

Comments

Dingersrun commented Sep 23, 2021

dpryan79 commented Sep 27, 2021

Dingersrun commented Sep 27, 2021

dpryan79 commented Sep 28, 2021