This collection of scripts was generated in order to adapt GATK pipeline for reduced representation sequencing approaches like GBS or RAD-seq to perform SNP calling.
The scripts are numbered in the order of application and perform the following functions: - uses barcodes table to demultiplex reads into sample-specific fastq.gz bins - standart read filtering with trimmomatic, FastQC visualisation is highly recommended before filter application - creates indexes for alignment - alignment over multiple samples with bwa + samtools using "for" loop - adds read groups to bam files and sorts corresponding bams. this step is OBLIGATORY for siccesefull calling - performs SNP calling for individual bam files - combines individual vcfs into one joint vcf. NOTE! the output vcf might be large! - performs genotyping on combined vcf - Basic variant filtration based on INFO field in vcf file. A useful manual for variant filtration id avilable here
Software used: axe-demux v.0.3.3-2-ge23af27 trimmomatic v.0.38 samtools v.1.9 bwa v.0.7.17 picard v.2.18.22 GATK v.
Disclaimer: presented scripts are NOT ready-to-go solution, however may sereve a as an example on how to adapt GATK pipeline for GBS and RAD seq data. For more details and help don't hestitate to contact.
This customized pipline was used in the following publication:
Gubaev, R.; Gorlova, L.; Boldyrev, S.; Goryunova, S.; Goryunov, D.; Mazin, P.; Chernova, A.; Martynova, E.; Demurin, Y.; Khaitovich, P. Genetic Characterization of Russian Rapeseed Collection and Association Mapping of Novel Loci Affecting Glucosinolate Content. Genes 2020, 11, 926.