-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running Duphold, "expected AD field in snps VCF" #30
Comments
hi, you can run duphold without a snp vcf and just find changes in depth in your SVs. your SNP vcf has:
which does not follow spec. It should be:
|
My SNP VCFs actually have the AD field in the FORMAT section, not the INFO section. AD should be in the FORMAT section as it could be used for a multi-sample VCF..... But, I tried hacking my snp VCF to have the INFO header you name above and moving the AD field to INFO, I get the same error. |
shoot. I mean FORMAT, not INFO. what does your VCF have for AD in FORMAT? |
oh, I see your AD is:
should be |
Actually, AD is specified as both an INFO field AND a FORMAT field. The INFO version is presumably the sum of all the FORMAT ones, although I don’t see why that’s particularly useful. |
But, yes, mine is in FORMAT. And it should appear as : |
the FORMAT is the one that duphold uses and yes, that is correct. However, you can't simply change the header, you'll have to adjust the values for every record as well (or use a VCF from another caller) |
The variant FORMAT record AD fields contain integers.... how would they need to be adjusted? 1 10583 . G A 201.31 RF;RF8.6 RFQUAL_ALL=2.76;STR_PERIOD=0;STR_LENGTH=0;QD=3.097;MQ0=25;MQ=46.104;GC=0.710;CRF=0.111;AC= The AD entry is a valid integer..... |
AD should have multiple values for each samples. for a bi-allelic variant, it should have 2 values, the first indicates the number of reads supporting the reference allele and the 2nd indicates the number of reads supporting the alternate. I still suggest to run duphold without the snp vcf as it's not needed and doesn't add much for most cases. |
Well, it appears that just changing the header did the trick. Duphold is running now that I modified the snp.vcf to have |
it won't give you correct results. |
ok- will give it a shot |
I'm running duphold v 0.2.1
The command I am running is:
~/install_crap/duphold/duphold -v ./results.vcf.gz -b /efs/WGS/data/WGS/ILMN_exptA/b37/KC_downsampledBAM_and_VCF/NA24385/40x/S7508/NA24385.40x.S7508.aligned.deduped.sort.bam -f /efs/WGS/data/reference/human/human_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta -s ./x.vcf.gz -t 96 -o ./duphold.vcf
It spins for a while, then returns the error:
"expected AD field in snps VCF"
the thing is, my VCF files have the AD field present..... I created a small vcf to test this more directly, and here are the full contents of the x.vcf.gz
`##fileformat=VCFv4.3
##reference=human_g1k_v37_modified
##octopus=<version=0.6.3-beta_HEAD_5961a546,command="octopus --reference /home/kcibul/wgs_resources/data/reference/human/human_g1k_v37_modified.fasta
/human_g1k_v37_modified.fasta --reads NA24385.40x.S7508.aligned.deduped.sort.bam -t regions.bed --forest-file /home/kcibul/wgs_resources/data/referen
ce/forests/DC/germline.v0.7.0.forest -o NA24385.40x.S7508.octopus.tmp.vcf.gz --threads 192 -X 10000MB -B 38400 MB --sequence-error-model /home/kcibul
/wgs_resources/data/reference/octopus_err_models/novaseq.4a38e55.model --annotations AD ADP AF ARF BQ CRF DP FRF GC GQ MC MF MQ MQ0 QD QUAL SB STR_LE
NGTH STR_PERIOD --max-indel-errors 32 --duplicate-read-detection-policy AGGRESSIVE --max-haplotypes=400 --min-forest-quality=8",options="--allow-mark
ed-duplicates no --allow-octopus-duplicates no --allow-pileup-candidates-from-likely-misaligned-reads no --allow-qc-fails no --allow-reads-with-good-
decoy-supplementary-alignments no --allow-reads-with-good-unplaced-or-unlocalized-supplementary-alignments no --allow-secondary-alignments no --allow
-supplementary-alignments no --annotations[0] AD --annotations[1] ADP --annotations[2] AF --annotations[3] ARF --annotations[4] BQ --annotations[5] C
RF --annotations[6] DP --annotations[7] FRF --annotations[8] GC --annotations[9] GQ --annotations[10] MC --annotations[11] MF --annotations[12] MQ --
annotations[13] MQ0 --annotations[14] QD --annotations[15] QUAL --annotations[16] SB --annotations[17] STR_LENGTH --annotations[18] STR_PERIOD --asse
mble-all no --assembler-mask-base-quality 10 --backtrack-level NONE --bad-region-tolerance NORMAL --bamout-type MINI --caller population --consider-u
nmapped-reads no --contig-output-order REFERENCE_INDEX --contig-ploidies[0] Y=1 --contig-ploidies[1] chrY=1 --contig-ploidies[2] MT=1 --contig-ploidi
es[3] chrM=1 --denovo-filter-expression QUAL < 50 | PP < 40 | GQ < 20 | MQ < 30 | AF < 0.1 | SB > 0.95 | BQ < 20 | DP < 10 | DC > 1 | MF > 0.2 | FRF \
I ran pyvcf on this VCF file, and the FORMAT AD field appears in it.
I ran duphold on VCF files generated by Dragen and Sentieon, they both return the same error.
Any advice for how to fix this? I am eager to apply duphold to a clinical product I am developing, but this roadblock has me blocked.
Thanks!
John Major
The text was updated successfully, but these errors were encountered: