Quality Filtration, Mapping, and Variant Calling Pipeline

This repository contains a set of shell scripts for performing quality filtration of Illumina raw reads, mapping with BWA, and variant calling with GATK HaplotypeCaller. The pipeline includes the following scripts:

Scripts

fastp.sh

Quality filtration of Illumina raw reads using the fastp tool.

Usage:

bash fastp.sh <input_dir> <output_dir>

bwa.sh

Mapping of quality-filtered reads to a reference genome using BWA.

Usage:

bash bwa.sh <reference_dir> <input_dir> <output_dir>

sortSAM.sh

Sorting and indexing SAM files using GATK's SortSam.

Usage:

bash sortSAM.sh <input_dir> <output_dir>

dedupBAM.sh

Deduplication of sorted BAM files using GATK's MarkDuplicates.

Usage:

bash dedupBAM.sh <input_dir> <output_dir>

haplotypecaller.sh Variant calling using GATK HaplotypeCaller and generating gVCF files.

Usage:

bash haplotypecaller.sh <reference_genome> <input_dir> <output_dir> <tmp_dir> <num_runs>

Usage

install fastp, bwa and gatk4
Clone this repository to your local machine.
Ensure you have the necessary dependencies installed (e.g., fastp, BWA, GATK).
Create a directory structure for your data, reference, and output.
Run each script with the required arguments according to their respective usage instructions.

Output

The output of each script includes quality-filtered reads, sorted and indexed BAM files, deduplicated BAM files, and gVCF files for variant calling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Quality Filtration, Mapping, and Variant Calling Pipeline

Scripts

fastp.sh

bwa.sh

sortSAM.sh

dedupBAM.sh

Usage

Output

Files

README.md

Latest commit

History

README.md

File metadata and controls

Quality Filtration, Mapping, and Variant Calling Pipeline

Scripts

fastp.sh

bwa.sh

sortSAM.sh

dedupBAM.sh

Usage

Output