Skip to content

Commit c9482ec

Browse files
Merge pull request #15 from MaximilianStammnitz/dev
Pipeline documentation: standardisation with nf-core template architecture
2 parents 1c4c6dd + 178fc85 commit c9482ec

File tree

5 files changed

+334
-235
lines changed

5 files changed

+334
-235
lines changed

README.md

Lines changed: 79 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -6,118 +6,95 @@
66
</h1>
77

88
[![GitHub Actions CI Status](https://github.com/nf-core/deepmutscan/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/deepmutscan/actions/workflows/ci.yml)
9-
[![GitHub Actions Linting Status](https://github.com/nf-core/deepmutscan/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/deepmutscan/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/deepmutscan/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
9+
[![GitHub Actions Linting Status](https://github.com/nf-core/deepmutscan/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/deepmutscan/actions/workflows/linting.yml)
10+
[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/deepmutscan/results)
11+
[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
1012
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
1113

1214
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)
1315
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
1416
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
1517
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
1618
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/deepmutscan)
17-
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23deepmutscan-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/deepmutscan)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
19+
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23deepmutscan-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/deepmutscan)
20+
[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)
21+
[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)
22+
[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
1823

19-
# 1. Overview
20-
**nf-core/deepmutscan** is a reproducible, scalable, and community-curated pipeline for analyzing deep mutational scanning (DMS) data using shotgun DNA sequencing. DMS enables researchers to measure the fitness effects of thousands of gene variants simultaneously, helping to classify disease causing mutants in human and animal populations, to learn fundamental rules of virus evolution, protein architecture, splicing or small-molecule interactions.
24+
## Introduction
2125

22-
While DNA synthesis and sequencing technologies have advanced substantially, long open reading frame (ORF) targets still present major challenges for DMS studies. Shotgun DNA sequencing can be used to greatly speed up the inference of long ORF mutant fitness landscapes, theoretically at no expense in accuracy. We have designed the **nf-core/deepmutscan** pipeline to unlock the power of shotgun sequencing based DMS studies on long ORFs, to simplify and standardise the complex bioinformatics steps involved in data processing of such experiments – from read alignment to QC reporting and fitness landscape inferences.
26+
`nf-core/deepmutscan` is a workflow designed for the analysis of deep mutational scanning (DMS) data. DMS enables researchers to experimentally measure the fitness effects of thousands of genes or gene variants simultaneously, helping to classify disease causing mutants in human and animal populations, to learn the fundamental rules of virus evolution, protein architecture, splicing, small-molecule interactions and many other phenotypes.
2327

24-
> 📄 Reference: Wehnert et al., _bioRxiv_ preprint (coming soon)
28+
While DNA synthesis and sequencing technologies have advanced substantially, long open reading frame (ORF) targets still present major challenges for DMS studies. Shotgun DNA sequencing can be used to greatly speed up the inference of long ORF mutant fitness landscapes, theoretically at no expense in accuracy. We have designed the `nf-core/deepmutscan` pipeline to unlock the power of shotgun sequencing based DMS studies on long ORFs, to simplify and standardise the complex bioinformatics steps involved in data processing of such experiments – from read alignment to QC reporting and fitness landscape inferences.
2529

26-
---
30+
<p align="center">
31+
<img title="DeepMutScan Workflow" src="docs/pipeline.png" width=80%>
32+
</p>
2733

28-
# 2. Features of nf-core/deepmutscan
29-
- End-to-end analyses of DMS shotgun sequencing data
30-
- Modular, three-stage workflow: alignment → QC → error-aware fitness estimation
31-
- Integrates with popular statistical tools like [DiMSum](https://github.com/lehner-lab/DiMSum), [Enrich2](https://github.com/FowlerLab/Enrich2), [rosace](https://github.com/pimentellab/rosace/) and [mutscan](https://github.com/fmicompbio/mutscan)
32-
- Supports multiple mutagenesis strategies, e.g. nicking by NNK and NNS codons
33-
- Containerized via Docker, Singularity and Apptainer
34-
- Scalable across HPC and Cloud systems
35-
- Monitors CPU, memory, and CO₂ usage
36-
37-
For details of the pipeline and potential future expansions, please consider reading our [detailed description](docs/pipeline_steps.md).
38-
39-
---
40-
41-
# 3. Installation
42-
**nf-core/deepmutscan** uses [Nextflow](https://nf-co.re/docs/usage/getting_started/installation), which must be installed on your system:
43-
44-
```bash
45-
java -version # Check that Java v11+ is installed
46-
curl -s https://get.nextflow.io | bash # Download Nextflow
47-
chmod +x nextflow # Make executable
48-
mv nextflow ~/bin/ # Add to user's $PATH
49-
```
50-
51-
The pipeline itself requires no installation – Nextflow will fetch it directly from GitHub:
34+
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
5235

53-
```bash
54-
nextflow run nf-core/deepmutscan -profile docker
55-
```
56-
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmutscan/usage) and the [parameter documentation](https://nf-co.re/deepmutscan/parameters).
36+
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/deepmutscan/results).
5737

58-
---
38+
## Major features
5939

60-
# 4. Usage
61-
Prepare:
62-
- A **sample sheet** CSV to specify input/output labels, replicates, etc. (see [example](assets/samplesheet.csv))
63-
- A **reference FASTA** file for the gene or region of interest
40+
- End-to-end analyses of various DMS data
41+
- Modular, three-stage workflow: alignment → QC → error-aware fitness estimation
42+
- Integration with popular statistical fitness estimation tools like [DiMSum](https://github.com/lehner-lab/DiMSum), [Enrich2](https://github.com/FowlerLab/Enrich2), [rosace](https://github.com/pimentellab/rosace/) and [mutscan](https://github.com/fmicompbio/mutscan)
43+
- Support of multiple mutagenesis strategies, e.g. by nicking with degenerate NNK and NNS codons
44+
- Containerisation via Docker, Singularity and Apptainer
45+
- Scalability across HPC and Cloud systems
46+
- Monitoring of CPU, memory, and CO₂ usage
6447

65-
To execute **nf-core/deepmutscan**, run the basic command:
48+
For more details on the pipeline and on potential future expansions, please consider reading our [usage description](https://nf-co.re/deepmutscan/usage).
6649

67-
```bash
68-
nextflow run nf-core/deepmutscan \
69-
-profile singularity,local \
70-
--input ./input.csv \
71-
--reading_frame 1-300 \
72-
--fasta ./ref.fa \
73-
--mutagenesis max_diff_to_wt \
74-
--run_seqdepth false \
75-
--fitness true \
76-
--outdir ./results
77-
```
50+
## Step-by-step pipeline summary
7851

79-
### Required parameters
52+
The pipeline processes deep mutational scanning (DMS) sequencing data in several stages:
8053

81-
| Parameter | Description |
82-
|--------------------|-----------------------------------------------------|
83-
| `--input` | Path to sample sheet CSV |
84-
| `--outdir` | Path to output directory |
85-
| `--fasta` | Reference FASTA file |
86-
| `--reading_frame` | Start and end nucleotide (e.g. `1-300`) |
54+
1. Alignment of reads to the reference open reading frame (ORF) (`BWA-mem`)
55+
2. Filtering of wildtype and erroneous reads (`samtools view`)
56+
3. Read merging for base error reduction (`vsearch merge`, `BWA-mem`)
57+
4. Mutation counting (`GATK AnalyzeSaturationMutagenesis`)
58+
5. DMS library quality control
59+
6. Data summarisation across samples
60+
7. Single nucleotide variant error correction _(in development)_
61+
8. Fitness estimation _(in development)_
8762

88-
### Optional parameters *(in development)*
63+
## Usage
8964

90-
| Parameter | Default | Description |
91-
|------------------------|-------------|-------------------------------------------------|
92-
| `--run_seqdepth` | `false` | Estimate sequencing saturation by rarefaction |
93-
| `--fitness` | `false` | Default fitness inference module |
94-
| `--dimsum` | `false` | Optional fitness inference module *(AMD/x86_64 systems only)* |
95-
| `--mutagenesis` | `max_diff_to_wt` | Deep mutational scanning strategy used *(in development)* |
96-
| `--error-estimation` | `wt_sequencing` | Error model used to correct 1nt counts *(in development)* |
97-
| `--read-align` | `bwa-mem` | Read aligner *(in development)* |
65+
> [!NOTE]
66+
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
9867
99-
More options and advanced configuration: [see vignette](link). For further information or help, don't hesitate to get in touch on the [Slack `#deepmutscan` channel](https://nfcore.slack.com/channels/deepmutscan) (you can join with [this invite](https://nf-co.re/join/slack)).
68+
First, prepare a samplesheet with your input/output data in which each row represents a pair of fastq files (paired end). This should look as follows:
10069

101-
---
102-
103-
# 5. Input Data
70+
```csv title="samplesheet.csv"
71+
sample,type,replicate,file1,file2
72+
ORF1,input,1,/reads/forward1.fastq.gz,/reads/reverse1.fastq.gz
73+
ORF1,input,2,/reads/forward2.fastq.gz,/reads/reverse2.fastq.gz
74+
ORF1,output,1,/reads/forward3.fastq.gz,/reads/reverse3.fastq.gz
75+
ORF1,output,2,/reads/forward4.fastq.gz,/reads/reverse4.fastq.gz
76+
```
10477

105-
The primary pipeline input is a sample sheet `.csv` file listing:
78+
Secondly, specify the gene or gene region of interest using a reference FASTA file via `--fasta`. Provide the exact codon coordinates using `--reading_frame`.
10679

107-
- Paths to paired-end `.fastq.gz` files from shotgun sequencing
108-
- Their classification as either input or output samples
109-
- Replicate IDs
110-
- Associated experimental metadata
80+
Now, you can run the pipeline using:
11181

112-
See [sample CSV](assets/samplesheet.csv) for formatting.
82+
```bash title="example pipeline run"
83+
nextflow run nf-core/deepmutscan \
84+
-profile <docker/singularity/.../institute> \
85+
--input ./samplesheet.csv \
86+
--fasta ./ref.fa \
87+
--reading_frame 1-300 \
88+
--outdir ./results
89+
```
11390

114-
---
91+
There are several optional [parameters](https://nf-co.re/deepmutscan/parameters), some of which are currently in development. For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmutscan/usage).
11592

116-
# 6. Outputs
93+
## Pipeline output
11794

11895
After execution, the pipeline creates the following directory structure:
11996

120-
```
97+
```folder title="output folder structure"
12198
results/
12299
├── fastqc/ # Individual HTML reports for specified fastq files, raw sequencing QC
123100
├── fitness/ # Merged variant count tables, fitness and error estimates, replicate correlations and heatmaps
@@ -129,45 +106,40 @@ results/
129106
└── report.html # Nextflow summary report incl. detailed CPU and memory usage per for all tasks
130107
```
131108

132-
---
109+
For a full overview of the output file types, please refer to the specific [documentation](https://nf-co.re/deepmutscan/output).
133110

134-
# 7. Citation
135-
136-
If you use this pipeline in your research, please cite:
137-
> 📄 Wehnert et al., _bioRxiv_ preprint (coming soon)
111+
## Contributing
138112

139-
Please also cite the nf-core framework:
140-
> 📄 Ewels et al., _Nature Biotechnology_, 2020
141-
> [https://doi.org/10.1038/s41587-020-0439-x](https://doi.org/10.1038/s41587-020-0439-x)
142-
143-
---
113+
We welcome contributions from the community!
144114

145-
# 8. License
115+
For technical challenges and feedback on the pipeline, please use our [Github repository](https://github.com/nf-core/deepmutscan). Please open an [issue](https://github.com/nf-core/deepmutscan/issues/new) or [pull request](https://github.com/nf-core/deepmutscan/compare) to:
146116

147-
[MIT License](link)
117+
- Report bugs or solve data incompatibilities when running `nf-core/deepmutscan`
118+
- Suggest the implementation of new modules for custom DMS workflows
119+
- Help improve this documentation
148120

149-
&copy; 2025 Benjamin Wehnert, Taylor Mighell, Fei Sang, Ben Lehner, Maximilian Stammnitz
121+
If you are interested in getting involved as a developer, please consider joining our interactive [`#deepmutscan` Slack channel](https://nfcore.slack.com/channels/deepmutscan) (via [this invite](https://nf-co.re/join/slack)).
150122

151-
---
123+
## Credits
152124

153-
# 9. Contributing
125+
`nf-core/deepmutscan` was originally written by [Benjamin Wehnert](https://github.com/BenjaminWehnert1008) and [Max Stammnitz](https://github.com/MaximilianStammnitz) at the [Centre for Genomic Regulation, Barcelona](https://www.crg.eu/), with the generous support of an EMBO Long-term Postdoctoral Fellowship and a Marie Skłodowska-Curie grant by the European Union.
154126

155-
We welcome contributions from the community!
127+
If you use `nf-core/deepmutscan` in your analyses, please cite:
156128

157-
Please open an [issue](../../issues/new) or [pull request](../../compare) via this GitHub page, to:
158-
- Suggest or help implementing new modules for custom workflows
159-
- Report bugs and other challenges in running **nf-core/deepmutscan**
160-
- Help improve this documentation
129+
> 📄 Wehnert et al., _bioRxiv_ preprint (coming soon)
161130
162-
You can also reach out to us via the **nf-core Slack**, by use of the `#dms` channel ([join here](https://join.slack.com/share/enQtOTMyMDc3MTA0Mzg0Mi04YmRiNDEwZTBlOTRiN2M2ZGU5ZGVmOWQ3YzA0YjA4NzhiNjFhNTVlNDA4ZTZjOTE2MjE5MmIzYWZjZTljMTE3)).
131+
Please also cite the `nf-core` framework:
163132

164-
---
133+
> 📄 Ewels et al., _Nature Biotechnology_, 2020
134+
> [https://doi.org/10.1038/s41587-020-0439-x](https://doi.org/10.1038/s41587-020-0439-x)
165135
166-
# 10. Contact
136+
## Scientific contact
167137

168-
For detailled scientific or technical questions, feedback and experimental discussions, feel free to contact us directly:
138+
For scientific discussions around the use of this pipeline (e.g. on experimental design or sequencing data requirements), please feel free to get in touch with us directly:
169139

170-
- Benjamin Wehnert — wehnertbenjamin@gmail.com
140+
- Benjamin Wehnert — wehnertbenjamin@gmail.com
171141
- Maximilian Stammnitz — maximilian.stammnitz@crg.eu
172142

173-
---
143+
## CHANGELOG
144+
145+
- [CHANGELOG](CHANGELOG.md)

0 commit comments

Comments
 (0)