An snakemake and R based pipeline for performing genomic prediction on pool-seq data, using rrBLUP, as a follow up to a genome wide association study (GWAS).
The pipeline was a designed and tested on data on ash dieback susceptibility in Fraxinus excelsior.
The pipeline takes the pools_rc file produced by Popoolation2 as input for the pooled training population, and a vcf file of SNPs as input for the test individuals.
This is a beta version of the pipeline - currently undergoing revisions
Paths to the input files are listed in config.yaml. Currently paths are set to example data, update paths to run on own data.
#pool_rc Ouput file from Popoolation2
#pool info file File containing info on the pool conditions The numbers included must corespond to the output of Popoolation2
#list of snps to use A list of snps on which to run GP - i.e. a list of gwas hits
#individual info file File containing info on test individuals
#individual genotype matrix A matrix containing the genotypes of test individuals
effect size tables written out to data/output/effect_sizes.txt gebv tables written to data/output/gebv.txt
Snakemake: https://snakemake.readthedocs.io/en/stable/
The pipeline installs the following R packages:
argparse
rrBLUP
vcfR
data.table
Set up conda enviroment with snakemake (see above)
conda activate snakemake snakemake --use-conda
Scripts for simulation are being updated
Apache License, Version 2.0