nf-core/bactmap: Output

This document describes the output produced by the pipeline.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

Fetch from ENA (Optional) Fetch reads from the ENA
Trim Reads Read trimmimg using trimmomatic
Estimate genome size
Downsample reads
Map reads
Call variants
Filter variant
Pseudogenome creation
Pseudogenome alignment creation
Recombination removal(Optional)
Invariant site removal
Phylogenetic tree creation (Optional)

Fetch from ENA

This process will fetch reads from the ENA archive using the enaDataGet tool from ENA Browser Tools

Trim Reads

Trim with Trimmomatic reads based on the parameters ILLUMINACLIP:adapter_file.fas:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 and with MIN_LEN dynamically determined based on 30% of the read length Output directory: <OUTPUT DIR>/trimmed_fastqs Fastq files post trimming will be written here

Genome Size Estimation

Estimate the size of the genome using Mash

Downsample reads

If the --depth_cutoff parameter is specified then reads will be downsampled using seqtk to the specified depth

Map reads

The reads will be mapped to the specified reference genome using bwa mem Output directory: <OUTPUT DIR>/sorted_bams Sorted bam files will be written here

Call variants

Variants will be called using samtools

Filter variants

Variants will be filtered using bcftools in order to flag low quality SNPs using the default filter of %QUAL<25 || FORMAT/DP<10 || MAX(FORMAT/ADF)<5 || MAX(FORMAT/ADR)<5 || MAX(FORMAT/AD)/SUM(FORMAT/DP)<0.9 || MQ<30 || MQ0F>0.1 Output directory: <OUTPUT DIR>/filtered_bcfs Filtered vcf files will be written here

Pseudogenome creation

A pseudogenome based on the variants called is created where missing positions are encoded as - characters and low quality positions as N. All other positions either match the reference or are encoded as a SNV of either G,A,T or C. The script filtered_bcf_to_fasta.py is used. Output directory: <OUTPUT DIR>/pseudogenomes A pseudogenome for each sample will be written here

Pseudogenome alignment creation

The pseudogenomes from the previous step are concatenanted to make a whole genome alignment Output directory: <OUTPUT DIR>/pseudogenomes The multi-sample pseudogenome alignment will be written here

Recombination removal

Recombination is removed from the alignment using gubbins

Invariant sites

Invariant sites are removed using snp-sites

Phylogenetic tree creation

A Maximum likelihood tree is generated using IQ-TREE Output directory: <OUTPUT DIR> The consensus tree aligned_pseudogenome.variants_only.contree including bootstrap values will be written here

Software used within the pipeline

Trimmomatic A flexible read trimming tool for Illumina NGS data.
mash Fast genome and metagenome distance estimation using MinHash.
seqtk A fast and lightweight tool for processing sequences in the FASTA or FASTQ format.
bwa mem Burrow-Wheeler Aligner for short-read alignment
samtools Utilities for the Sequence Alignment/Map (SAM) format
bcftools Utilities for variant calling and manipulating VCFs and BCFs
filtered_bcf_to_fasta.py Python utility to create a pseudogenome from a bcf file where each position in the reference genome is included
gubbins Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences
snp-sites Finds SNP sites from a multi-FASTA alignment file
IQ-TREE Efficient software for phylogenomic inference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output.md

output.md

nf-core/bactmap: Output

Pipeline overview

Fetch from ENA

Trim Reads

Genome Size Estimation

Downsample reads

Map reads

Call variants

Filter variants

Pseudogenome creation

Pseudogenome alignment creation

Recombination removal

Invariant sites

Phylogenetic tree creation

Software used within the pipeline

Files

output.md

Latest commit

History

output.md

File metadata and controls

nf-core/bactmap: Output

Pipeline overview

Fetch from ENA

Trim Reads

Genome Size Estimation

Downsample reads

Map reads

Call variants

Filter variants

Pseudogenome creation

Pseudogenome alignment creation

Recombination removal

Invariant sites

Phylogenetic tree creation

Software used within the pipeline