DNMFilter_Indel is a machine learning based tool designed to filter out false positive de novo indels obtained by any computational approaches from whole genome or exome sequencing data. It can be used as either a stand-alone tool to detect de novo indels or coupled with other commonly used de novo indel detection tool to improve specificity.
The easiest way to get DNMFilter_Indel is to download the binary distribution from the DNMFilter_Indel github release page. Alternatively, you can build DNMFilter_Indel from source with gradle.
- git clone --recursive https://github.com/yongzhuang/DNMFilter_Indel.git
- cd DNMFilter_Indel/
- gradle build
If you want to run DNMFilter_Indel, you'll need:
- Java SE Development Kit 8
- R (Rscript exectuable must be on the path)
- Runiversal package in R
- gbm package (https://cran.r-project.org/web/packages/gbm/) in R
You'll find the executable jar file in DNMFilter_Indel/build/libs/.
-
bam list file (two columns, tab-separated)
Column 1: sampleID
Column 2: path of .bam file -
DNM file (.csv file, the first five columns are mandatory, comma-separated)
Column 1: familyID
Column 2: chromsome
Column 3: position
Column 4: reference allele
Column 5: alternative allele
Note: The DNM file should be first sorted by familyID and then by chromosome position. -
feature configuration file
The feature that values 1 is selected to train the model and filter DNMs, while the feature that values 0 is not selected. -
pedigree file (six columns, tab-separated)
Column 1: Family ID
Column 2: Individual ID
Column 3: Paternal ID
Column 4: Maternal ID
Column 5: Sex (1=male; 2=female; other=unknown)
Column 6: Phenotype
Note: The Family ID must be the same as the first column of DNM file, and the Individual ID must be the same as the first column of bam list file. -
output file (.csv file, the six columns are mandatory, comma-separated)
Column 1: familyID
Column 2: chromsome
Column 3: position
Column 4: reference allele
Column 5: alternative allele Column 6: prediction score