Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312

NeillGibson · 2016-12-12T15:10:13Z

Hi,

Are you planning to support gVCF merging and genotyping on Spark / Adam?

As far as I know the only way to variant call 100K samples is trough creating gVCF files per sample and subsequent gVCF merging and genotyping.

The most well known / production ready implementation of this is from the Broad in GATK:

CombineGVCFs
https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php

GenotypeGVCFs
https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeGVCFs.php

For variant calling of the the 100K+ samples in Exac/GnomAD the first merge step was replaced with GenomicsDB from intel. As I understand it GenomicsDB efficiently stores per sample gVCF tracks and can then efficiently stream merged VCF into GenotypeGVCFs

Broad/Intel GenomicsDB
https://github.com/Intel-HLS/GenomicsDB
https://vimeo.com/194823486/506da42daf (from minute 19)

Genomics DB is based on Intel TileDB
http://istc-bigdata.org/tiledb/index.html

Something similar to CombineGVCFs/GenotypeGVCFs/GenomicsDB is being developed by DNAnexus that also supports on demand joint genotyping from Freebayes gVCF:

GLnexus
https://github.com/dnanexus-rnd/GLnexus

Are you also planning scalable gVCF storage and on demand gVCF merge and joint genotyping on top of Spark / Adam?

Thank you.

Resolves bigdatagenomics/adam#1312. Enables gVCF style data to be used with the joint variant caller by extracting the sites where a variant was called as a genotype in one of the gVCFs. These variant sites are then joined back against the original genotypes. If the genotyped allele was not present and a reference model block is present, the allele is extracted out from the reference model.

ggittu · 2018-10-03T19:27:41Z

@heuermh @fnothaft Is there any document that says how to use CombineGVCFs and GenotypeGVCFs in ADAM (spark way)?

heuermh · 2018-10-03T20:11:01Z

Is this helpful?

https://bdg-avocado.readthedocs.io/en/latest/workflows/joint.html

ggittu · 2018-10-03T20:36:58Z

@heuermh Got it. So i will run something like

avocado-submit jointer -from_gvcf /vcf/*.gvcf /output

Here I think the jointer is for CombineGVCFs , what is the one I can use for GenotypeGVCFs?

fnothaft mentioned this issue Jan 15, 2017

Add back joint caller bigdatagenomics/avocado#199

Closed

fnothaft closed this as completed in bigdatagenomics/avocado@141881c Oct 24, 2017

heuermh added this to the 0.23.0 milestone Dec 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312

Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312

NeillGibson commented Dec 12, 2016 •

edited

Loading

ggittu commented Oct 3, 2018

heuermh commented Oct 3, 2018

ggittu commented Oct 3, 2018

Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312

Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312

Comments

NeillGibson commented Dec 12, 2016 • edited Loading

ggittu commented Oct 3, 2018

heuermh commented Oct 3, 2018

ggittu commented Oct 3, 2018

NeillGibson commented Dec 12, 2016 •

edited

Loading