Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testRun != testReference #4

Open
ulah opened this issue Nov 11, 2016 · 3 comments
Open

testRun != testReference #4

ulah opened this issue Nov 11, 2016 · 3 comments

Comments

@ulah
Copy link

ulah commented Nov 11, 2016

Hi,
I build and installed w/o warnings/errors.
The test data were processed w/o warnings/errors.

However, the generated file output differs from the reference (output.golden.vcf or output.golden.anno)?!

$ cat testdata/output.golden.vcf
##INFO=<ID=EB,Number=1,Type=Float,Description="EBCall Score">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr11 397665 . G C 60 PASS EB=3.381;SOMATIC
chr11 562012 . C T 60 PASS EB=3.389;SOMATIC
chr11 824202 . C A 60 PASS EB=2.125;SOMATIC
chr11 1013896 . C T 60 PASS EB=2.765;SOMATIC
chr11 1081746 . G C 60 PASS EB=6.221;SOMATIC
chr11 1277322 . G T 60 PASS EB=2.468;SOMATIC
chr11 2418116 . C A 60 PASS EB=6.419;SOMATIC
chr11 3680752 . G A 60 PASS EB=2.558;SOMATIC
chr11 5012707 . C G 60 PASS EB=4.998;SOMATIC
chr11 5221726 . A G 60 PASS EB=6.542;SOMATIC

$ cat testOut.vcf
##INFO=<ID=EB,Number=1,Type=Float,Description="EBCall Score">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr11 397665 . G C 60 PASS EB=2.78;SOMATIC
chr11 562012 . C T 60 PASS EB=2.234;SOMATIC
chr11 824202 . C A 60 PASS EB=2.087;SOMATIC
chr11 1013896 . C T 60 PASS EB=2.542;SOMATIC
chr11 1081746 . G C 60 PASS EB=4.902;SOMATIC
chr11 1277322 . G T 60 PASS EB=2.468;SOMATIC
chr11 2418116 . C A 60 PASS EB=6.036;SOMATIC
chr11 3680752 . G A 60 PASS EB=2.558;SOMATIC
chr11 5012707 . C G 60 PASS EB=4.788;SOMATIC
chr11 5221726 . A G 60 PASS EB=7.367;SOMATIC

All "EB values" are different - any ideas?
Cheers

@friend1ws
Copy link
Member

Hi, thanks for the interest in EBFIlter!

Could you tell me which version of samtools are you using?
The newer versions of samtools treat orverlapping bases differently than old versions
(they avoid double counting by changing the qualities overlapped bases).
Our test data is generated by using samtools v0.18.0.
But I think the treatment of overlapping bases by newer version is in principle nice,
and recommend to use newer versions.

@ulah
Copy link
Author

ulah commented Nov 15, 2016

Ok, thanks. I use version 1.2, so that's probably the reason for the differences.

Btw: What threshold did you set to call a variant somatic/not somatic. As everything in the example is somatic and above 2, I'd guess 2 (I can't deduce it from my real data, as I'm only having annovar files and with those one only gets the value, not the somatic/not somatic tag), but what's the logic behind it? From the real data I have, 3 actually seems to be a better threshold, but it might be due to the increased noise I have in my data (single cell data from patient samples)

Cheers

@friend1ws
Copy link
Member

The p-value changes according to the number of control samples (when more number of control samples are used, the -log10(p-value) tends to increase). We usually use 20 control samples and set the threshold of -log10(p-value) to 4.0. When you have less number of control samples (e.g., 10), I recommend to set the threshold to 3.0 or so. This convention is validated by a number of our empirical studies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants