-
Notifications
You must be signed in to change notification settings - Fork 23
rare disease
The following commands encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets). For exomes, these 2 commands will often result in <20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named slivar-functions.js those are easily modified to allow more/less stringency or different logic.
## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
# ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
# or hg38: ftp://ftp.ensembl.org/pub/release-96/gff3/homo_sapiens/
gff=/Homo_sapiens.GRCh38.96.gff3.gz
cohort=mycohort
## end variables
bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta $vcf -O u \
| slivar expr --vcf - \
--ped $ped \
-o vcfs/$cohort.vcf \
--pass-only \
-g /home/brentp/src/slivar/gnomad.hg38.v2.zip \
--info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' \
--js ../slivar/js/slivar-functions.js \
--family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' \
--family-expr 'recessive:fam.every(segregating_recessive)' \
--family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' \
--family-expr 'x_recessive:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_recessive_x)' \
--trio 'comphet_side:comphet_side(kid, mom, dad) && INFO.gnomad_nhomalt < 10' \
# compound-hets then groups by gene (it requires and automatically uses annotation from VEP/SnpEff/Bcftools to infer gene):
slivar compound-hets -v vcfs/$cohort.vcf \
--sample-field comphet_side --sample-field denovo -p $ped > vcfs/$cohort.ch.vcf
Note that for multi-sample VCFS, -s
arguments to compound-hets
make sure that only the trio(s) that passed the filters in the upstream slivar
command are used to compose the 2 ends of a compound heterozgyote.
For speed, this can be automatically parallelized using pslivar.
For hg38, it's wise to add -g topmed.hg38.dbsnp.151.zip
to the first command and then add INFO.topmed_af < 0.05
to the --info
expression ( -g can be added multiple times). This removes common variants that are found in hg38, but not present in gnomAD (which is lifted from GRCh37). Get the zip file for topmed along with the other zip files here
Note that this uses impactful which will remove, for example synonymous, intronic, intergenic. If the VCF has not been previously annotated with VEP, bcftools, or snpEff, this will not be available.
The TSV command can be used to get these filtered VCFs into spreadsheets for final evaluation. See its wiki page for more detail
slivar-functions.js also contains a function, segregating_dominant
for cases with a dominant mode of inheritance. Users can add this as needed, but remember that dominant will yield many more candidate variants.