A Nextflow pipeline for processing paired-end Illumina MNASeq sequencing data.
The pipeline was written by The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
- Raw read QC (
FastQC
,Fastq Screen
) - Adapter trimming (
cutadapt
) - Alignment (
BWA
) - Mark duplicates (
picard
) - Filtering to remove:
- reads that are marked as duplicates (
SAMtools
) - reads that arent marked as primary alignments (
SAMtools
) - reads that are unmapped (
SAMtools
) - reads that map to multiple locations (
SAMtools
) - reads containing > 3 mismatches in either read of the pair (
BAMTools
) - reads that have a user-defined insert size (
BAMTools
) - reads that are soft-clipped (
BAMTools
) - reads that map to different chromosomes (
Pysam
) - reads that arent in FR orientation (
Pysam
) - reads where only one read of the pair fails the above criteria (
Pysam
)
- reads that are marked as duplicates (
- Merge alignments at replicate-level (
picard
)- Re-mark duplicates (
picard
) - Remove duplicate reads (optional;
SAMtools
) - Create normalised bigWig files scaled to 1 million mapped read pairs (
BEDTools
,wigToBigWig
)
- Re-mark duplicates (
- Call nucleosome positions and generate smoothed, normalised coverage wig files that can be used to generate occupancy profile plots between samples across features of interest (
DANPOS2
) - Create IGV session file containing bigWig tracks for data visualisation (
IGV
) - Collect and present QC at the raw read and alignment-level (
MultiQC
)
The documentation for the pipeline can be found in the docs/
directory:
- Installation
- Pipeline configuration
- Reference genome
- Design file
- Running the pipeline
- Output and interpretation of results
- Troubleshooting
The pipeline was written by the The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
The pipeline was developed by Harshil Patel.
The NGI-RNAseq pipeline developed by Phil Ewels was used a template for this pipeline. Many thanks to Phil and the team at SciLifeLab.
This project is licensed under the MIT License - see the LICENSE.md file for details.