testRun != testReference #4

ulah · 2016-11-11T10:35:11Z

Hi,
I build and installed w/o warnings/errors.
The test data were processed w/o warnings/errors.

However, the generated file output differs from the reference (output.golden.vcf or output.golden.anno)?!

$ cat testdata/output.golden.vcf
##INFO=<ID=EB,Number=1,Type=Float,Description="EBCall Score">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr11 397665 . G C 60 PASS EB=3.381;SOMATIC
chr11 562012 . C T 60 PASS EB=3.389;SOMATIC
chr11 824202 . C A 60 PASS EB=2.125;SOMATIC
chr11 1013896 . C T 60 PASS EB=2.765;SOMATIC
chr11 1081746 . G C 60 PASS EB=6.221;SOMATIC
chr11 1277322 . G T 60 PASS EB=2.468;SOMATIC
chr11 2418116 . C A 60 PASS EB=6.419;SOMATIC
chr11 3680752 . G A 60 PASS EB=2.558;SOMATIC
chr11 5012707 . C G 60 PASS EB=4.998;SOMATIC
chr11 5221726 . A G 60 PASS EB=6.542;SOMATIC

$ cat testOut.vcf
##INFO=<ID=EB,Number=1,Type=Float,Description="EBCall Score">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr11 397665 . G C 60 PASS EB=2.78;SOMATIC
chr11 562012 . C T 60 PASS EB=2.234;SOMATIC
chr11 824202 . C A 60 PASS EB=2.087;SOMATIC
chr11 1013896 . C T 60 PASS EB=2.542;SOMATIC
chr11 1081746 . G C 60 PASS EB=4.902;SOMATIC
chr11 1277322 . G T 60 PASS EB=2.468;SOMATIC
chr11 2418116 . C A 60 PASS EB=6.036;SOMATIC
chr11 3680752 . G A 60 PASS EB=2.558;SOMATIC
chr11 5012707 . C G 60 PASS EB=4.788;SOMATIC
chr11 5221726 . A G 60 PASS EB=7.367;SOMATIC

All "EB values" are different - any ideas?
Cheers

friend1ws · 2016-11-12T07:58:27Z

Hi, thanks for the interest in EBFIlter!

Could you tell me which version of samtools are you using?
The newer versions of samtools treat orverlapping bases differently than old versions
(they avoid double counting by changing the qualities overlapped bases).
Our test data is generated by using samtools v0.18.0.
But I think the treatment of overlapping bases by newer version is in principle nice,
and recommend to use newer versions.

ulah · 2016-11-15T09:33:02Z

Ok, thanks. I use version 1.2, so that's probably the reason for the differences.

Btw: What threshold did you set to call a variant somatic/not somatic. As everything in the example is somatic and above 2, I'd guess 2 (I can't deduce it from my real data, as I'm only having annovar files and with those one only gets the value, not the somatic/not somatic tag), but what's the logic behind it? From the real data I have, 3 actually seems to be a better threshold, but it might be due to the increased noise I have in my data (single cell data from patient samples)

Cheers

friend1ws · 2016-11-16T01:33:54Z

The p-value changes according to the number of control samples (when more number of control samples are used, the -log10(p-value) tends to increase). We usually use 20 control samples and set the threshold of -log10(p-value) to 4.0. When you have less number of control samples (e.g., 10), I recommend to set the threshold to 3.0 or so. This convention is validated by a number of our empirical studies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testRun != testReference #4

testRun != testReference #4

ulah commented Nov 11, 2016 •

edited

Loading

friend1ws commented Nov 12, 2016

ulah commented Nov 15, 2016

friend1ws commented Nov 16, 2016

testRun != testReference #4

testRun != testReference #4

Comments

ulah commented Nov 11, 2016 • edited Loading

friend1ws commented Nov 12, 2016

ulah commented Nov 15, 2016

friend1ws commented Nov 16, 2016

ulah commented Nov 11, 2016 •

edited

Loading