NOTE: this repository is archived to support workflow reproducibility. Active development continues at: https://github.com/icgc-argo-workflows/dna-seq-processing-wfs
This repository maintains the source code of the ICGC ARGO DNA Seq Processing Pipeline. The pipeline is written in Nextflow workflow language using DSLv2, with modules imported from other ICGC ARGO GitHub repositories. Specifically, here are repositories maintaining various of tools/modules:
- https://github.com/icgc-argo/dna-seq-processing-tools
- https://github.com/icgc-argo/data-processing-utility-tools
- https://github.com/icgc-argo/nextflow-dna-seq-processing-tools
- and https://github.com/icgc-argo/data-qc-tools-and-wfs
Each Nextflow module (including associated container image which is registered in Quay.io) is strictly version controlled and released independently. To ensure reproducibility the pipeline declares explicitly which specific version of a module is to be imported.
- download input sequencing metadata/data from
SONG/SCORE
- preprocess input sequencing reads (in
FASTQ
orBAM
) into lane level (aka read group level)BAM
- collect
CollectQualityYieldMetrics
usingPicard
tool for read group - perform
BWA-MEM
alignment againstGRCh38
reference genome in parallel for each laneBAM
- merge and markduplicate aligned lane
BAM
, produce coordinate-sortedCRAM/CRAI
andduplicates_metrics
- collect alignment QC metrics using
samtools stats
for aligned seq - collect
CollectOxoGMetrics
usingGATK
for aligned seq and calculateOxoQ
score - generate
SONG
metadata for aligned seq and upload them toSONG/SCORE
- generate
SONG
metadata for all collectedqc_metrics
and upload them toSONG/SCORE
To run the pipeline, please follow instruction here to install Nextflow (version 20.01.0
or higher) first.
Run 1.3.0
version of the pipeline:
nextflow run icgc-argo/dna-seq-processing-wfs -r 1.3.0 -params-file <your_params_file.json>
You may need to run nextflow pull icgc-argo/dna-seq-processing-wfs
if the version 1.3.0
is new since last time the pipeline was run.
Please note that SONG/SCORE services need to be available and you have appropriate API token.
Automated Travis CI testing has been set up. However, tests relying on SONG/SCORE will be skipped when CI is triggered on a Travis server where SONG/SCORE services are not available. When running tests locally (where SONG/SCORE services may be available) please use the following commands under the root directory of this Git repository:
# perform all tests when SONG/SCORE is available
export api_token=<your_api_token>
pytest -v
# or perform tests that do not need SONG/SCORE
TRAVIS=true pytest -v