Training Genotype Callers with Neural Networks #171

agitter · 2016-12-31T14:07:51Z

We present an open source software toolkit for training deep learning models to call genotypes in high-throughput sequencing data. The software supports SAM, BAM, CRAM and Goby alignments and the training of models for a variety of experimental assays and analysis protocols. We evaluate this software in the Illumina platinum whole genome datasets and find that a deep learning model trained on 80% of the genome achieves a 0.986% accuracy on variants (genotype concordance) when trained with 10% of the data from a genome. The software is distributed at https://github.com/CampagneLaboratory/variationanalysis. The software makes it possible to train genotype calling models on consumer hardware with CPUs or GPU(s). It will enable individual investigators and small laboratories to train and evaluate their own models and to make open source contributions. We welcome contributions to extend this early prototype or evaluate its performance on other gold standard datasets.

This short paper extends the recent #99 for genotype calling. It is primarily a response to DeepVariant (#159). There are several paragraphs contrasting the two approaches and an emphasis on the open source code of this method.

This will fit in the sequencing sub-section of the Study section.

agitter added paper study labels Dec 31, 2016

cgreene mentioned this issue Apr 8, 2017

Current Section Status #188

Closed

cgreene mentioned this issue Apr 30, 2017

Data/code sharing and data limitations #367

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Genotype Callers with Neural Networks #171

Training Genotype Callers with Neural Networks #171

agitter commented Dec 31, 2016

Training Genotype Callers with Neural Networks #171

Training Genotype Callers with Neural Networks #171

Comments

agitter commented Dec 31, 2016