-
Notifications
You must be signed in to change notification settings - Fork 33
Home
nf-core/mycosnp is a bioinformatics best-practice analysis pipeline for MycoSNP is a portable workflow for performing whole genome sequencing analysis of fungal organisms, including Candida auris. This method prepares the reference, performs quality control, and calls variants using a reference. MycoSNP generates several output files that are compatible with downstream analytic tools, such as those for used for phylogenetic tree-building and gene variant annotations..
Prepares a reference FASTA file for BWA alignment and GATK variant calling by masking repeats in the reference and generating the BWA index.
- Genome repeat identification and masking (
nucmer
) - BWA index generation (
bwa
) - FAI and DICT file creation (
Picard
,Samtools
)
Prepares samples (paired-end FASTQ files) for GATK variant calling by aligning the samples to a BWA reference index and ensuring that the BAM files are correctly formatted. This step also provides different quality reports for sample evaluation.
- Combine FASTQ file lanes if they were provided with multiple lanes.
- Filter unpaired reads from FASTQ files (
SeqKit
). - Down sample FASTQ files to a desired coverage or sampling rate (
SeqTK
). - Trim reads and assess quality (
FaQCs
). - Generate a QC report by extracting data from FaQCs report data.
- Align FASTQ reads to a reference (
BWA
). - Sort BAM files (
SAMTools
). - Mark and remove duplicates in the BAM file (
Picard
). - Clean the BAM file (
Picard "CleanSam"
). - Fix mate information in the BAM file (
Picard "FixMateInformation"
). - Add read groups to the BAM file (
Picard "AddOrReplaceReadGroups"
). - Index the BAM file (
SAMTools
). - FastQC - Filtered reads QC.
- Qualimap mapping quality report.
- MultiQC - Aggregate report describing results and QC from the whole pipeline
Calls variants and generates a multi-FASTA file and phylogeny.
- Call variants (
GATK HaplotypeCaller
). - Combine gVCF files from the HaplotypeCaller into a single VCF (
GATK CombineGVCFs
). - Call genotypes using the (
GATK GenotypeGVCFs
). - Filter the variants (
GATK VariantFiltration
) [default (but customizable) filter: 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || DP < 10']. - Run a customized VCF filtering script (
Broad Institute
). - Split the filtered VCF file by sample.
- Select only SNPs from the VCF files (
GATK SelectVariants
). - Split the VCF file with SNPs by sample.
- Create a consensus sequence for each sample (
BCFTools
,SeqTK
). - Create a multi-fasta file from the VCF SNP positions using a custom script (
Broad
). - Create phylogeny from multi-fasta file (
rapidNJ
,FastTree2
,RaxML
,IQTree
)