Web interface for gwas2vcf
Web interface for processing GWAS summary data to VCF format
# get source
git clone --recurse-submodules git@github.com:MRCIEU/gwas2vcfweb.git
cd gwas2vcfweb
# configure host
## store cromwell outputs here
mkdir -p /data/gwas2vcfweb
## upload & download from here
mkdir -p /data/gwas2vcfweb/data
## DB files
### Reference FASTA
mkdir -p /data/reference_genomes
cd /data/reference_genomes
wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.gz
wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.fai.gz
wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.dict.gz
gzip -d human_g1k_v37.fasta.gz
gzip -d human_g1k_v37.fasta.fai.gz
gzip -d human_g1k_v37.dict.gz
### 1kg
mkdir -p /data/1kg
cd /data/1kg
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz.tbi
gzip -d ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz
gzip -d ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz.tbi
### dbsnp
mkdir -p /data/dbsnp
cd /data/dbsnp
wget ftp://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz
wget ftp://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.25.gz.tbi
mv GCF_000001405.25.gz dbsnp.v153.b37.vcf.gz
mv GCF_000001405.25.gz.tbi dbsnp.v153.b37.vcf.gz.tbi
# build stack
docker-compose build --no-cache
## this hash MUST be the same as in the wdl file
cd gwas2vcf
hash=$(git rev-parse HEAD)
docker build --no-cache -t gwas2vcf:"$hash" .
# start
cd ..
docker-compose -p gwas2vcfweb -f ./docker-compose.yml up -d
# Run unit tests
docker exec -it gwas2vcfweb_web_1 pytest -v
- Point browser to
http://<hostname>
. Port can be configure in the compose file. - Upload file & obtain job identifier
- Download results for job identifier
Lyon, M.S., Andrews, S.J., Elsworth, B. et al. The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol 22, 32 (2021). https://doi.org/10.1186/s13059-020-02248-0