You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For variant calling of the the 100K+ samples in Exac/GnomAD the first merge step was replaced with GenomicsDB from intel. As I understand it GenomicsDB efficiently stores per sample gVCF tracks and can then efficiently stream merged VCF into GenotypeGVCFs
Something similar to CombineGVCFs/GenotypeGVCFs/GenomicsDB is being developed by DNAnexus that also supports on demand joint genotyping from Freebayes gVCF:
Resolvesbigdatagenomics/adam#1312. Enables gVCF style data to be used with the
joint variant caller by extracting the sites where a variant was called as a
genotype in one of the gVCFs. These variant sites are then joined back against
the original genotypes. If the genotyped allele was not present and a reference
model block is present, the allele is extracted out from the reference model.
Hi,
Are you planning to support gVCF merging and genotyping on Spark / Adam?
As far as I know the only way to variant call 100K samples is trough creating gVCF files per sample and subsequent gVCF merging and genotyping.
The most well known / production ready implementation of this is from the Broad in GATK:
CombineGVCFs
https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php
GenotypeGVCFs
https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeGVCFs.php
For variant calling of the the 100K+ samples in Exac/GnomAD the first merge step was replaced with GenomicsDB from intel. As I understand it GenomicsDB efficiently stores per sample gVCF tracks and can then efficiently stream merged VCF into GenotypeGVCFs
Broad/Intel GenomicsDB
https://github.com/Intel-HLS/GenomicsDB
https://vimeo.com/194823486/506da42daf (from minute 19)
Genomics DB is based on Intel TileDB
http://istc-bigdata.org/tiledb/index.html
Something similar to CombineGVCFs/GenotypeGVCFs/GenomicsDB is being developed by DNAnexus that also supports on demand joint genotyping from Freebayes gVCF:
GLnexus
https://github.com/dnanexus-rnd/GLnexus
Are you also planning scalable gVCF storage and on demand gVCF merge and joint genotyping on top of Spark / Adam?
Thank you.
The text was updated successfully, but these errors were encountered: