Clone this repo and build the image:
docker build -t opengenomics/variant-effect-predictor .
After building the image and downloading the offline cache, you can test the image like so:
docker run -v /vep/data/path/homo_sapiens:/mnt/homo_sapiens opengenomics/variant-effect-predictor variant_effect_predictor.pl --species homo_sapiens --assembly GRCh37 --offline
--no_progress --no_stats --vcf --minimal --dir /mnt/ --fasta /mnt/homo_sapiens/86_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
--input_file example_GRCh37.vcf --output_file /opt/ensembl-tools-release-84/scripts/variant_effect_predictor/example_GRCh37.vep.vcf --everything --dir_cache /mnt/
Download and unpack VEP's offline cache for GRCh37
export VEP_DATA = /home/.vep
cd $VEP_DATA
rsync -zvh rsync://ftp.ensembl.org/ensembl/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz $VEP_DATA
tar xvfz homo_sapiens_vep_86_GRCh37.tar.gz
wget http://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
bgzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
Download and index a custom ExAC r0.3.1 VCF, that skips variants overlapping known somatic hotspots:
curl -L ftp://ftp.broadinstitute.org:/pub/ExAC_release/release0.3.1/subsets/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz > $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz
bcftools filter --targets ^2:25457242-25457243,12:121176677-121176678 --output-type z --output $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.minus_somatic.vep.vcf.gz $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz
mv -f $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.minus_somatic.vep.vcf.gz $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz
tabix -p vcf $VEP_DATA/ExAC.r0.3.sites.minus_somatic.vcf.gz
Download and index the files required for the dbNSFP plugin:
wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFPv2.9.1.zip
unzip dbNSFPv2.9.1.zip
head -n1 dbNSFP2.9.1_variant.chr1 > h
cat dbNSFP2.9.1_variant.chr* | grep -v ^#chr | sort -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP.gz
tabix -s 1 -b 2 -e 2 dbNSFP.gz
Convert the offline cache for use with tabix, that significantly speeds up the lookup of known variants:
docker run -v $VEP_DATA:/mnt vep /root/vep/convert_cache.pl --species homo_sapiens --version 86_GRCh37 --dir /mnt