You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://github.com/nf-core/deepmutscan/actions/workflows/ci.yml)
9
-
[](https://github.com/nf-core/deepmutscan/actions/workflows/linting.yml)[](https://nf-co.re/deepmutscan/results)[](https://doi.org/10.5281/zenodo.XXXXXXX)
[](https://docs.conda.io/en/latest/)
14
16
[](https://www.docker.com/)
15
17
[](https://sylabs.io/docs/)
16
18
[](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/deepmutscan)
17
-
[](https://nfcore.slack.com/channels/deepmutscan)[](https://twitter.com/nf_core)[](https://mstdn.science/@nf_core)[](https://www.youtube.com/c/nf-core)
19
+
[](https://nfcore.slack.com/channels/deepmutscan)
20
+
[](https://twitter.com/nf_core)
21
+
[](https://mstdn.science/@nf_core)
22
+
[](https://www.youtube.com/c/nf-core)
18
23
19
-
# 1. Overview
20
-
**nf-core/deepmutscan** is a reproducible, scalable, and community-curated pipeline for analyzing deep mutational scanning (DMS) data using shotgun DNA sequencing. DMS enables researchers to measure the fitness effects of thousands of gene variants simultaneously, helping to classify disease causing mutants in human and animal populations, to learn fundamental rules of virus evolution, protein architecture, splicing or small-molecule interactions.
24
+
## Introduction
21
25
22
-
While DNA synthesis and sequencing technologies have advanced substantially, long open reading frame (ORF) targets still present major challenges for DMS studies. Shotgun DNA sequencing can be used to greatly speed up the inference of long ORF mutant fitness landscapes, theoretically at no expense in accuracy. We have designed the **nf-core/deepmutscan** pipeline to unlock the power of shotgun sequencing based DMS studies on long ORFs, to simplify and standardise the complex bioinformatics steps involved in data processing of such experiments – from read alignment to QC reporting and fitness landscape inferences.
26
+
`nf-core/deepmutscan` is a workflow designed for the analysis of deep mutational scanning (DMS) data. DMS enables researchers to experimentally measure the fitness effects of thousands of genes or gene variants simultaneously, helping to classify disease causing mutants in human and animal populations, to learn the fundamental rules of virus evolution, protein architecture, splicing, small-molecule interactions and many other phenotypes.
23
27
24
-
> 📄 Reference: Wehnert et al., _bioRxiv_ preprint (coming soon)
28
+
While DNA synthesis and sequencing technologies have advanced substantially, long open reading frame (ORF) targets still present major challenges for DMS studies. Shotgun DNA sequencing can be used to greatly speed up the inference of long ORF mutant fitness landscapes, theoretically at no expense in accuracy. We have designed the `nf-core/deepmutscan` pipeline to unlock the power of shotgun sequencing based DMS studies on long ORFs, to simplify and standardise the complex bioinformatics steps involved in data processing of such experiments – from read alignment to QC reporting and fitness landscape inferences.
- Integrates with popular statistical tools like [DiMSum](https://github.com/lehner-lab/DiMSum), [Enrich2](https://github.com/FowlerLab/Enrich2), [rosace](https://github.com/pimentellab/rosace/) and [mutscan](https://github.com/fmicompbio/mutscan)
32
-
- Supports multiple mutagenesis strategies, e.g. nicking by NNK and NNS codons
33
-
- Containerized via Docker, Singularity and Apptainer
34
-
- Scalable across HPC and Cloud systems
35
-
- Monitors CPU, memory, and CO₂ usage
36
-
37
-
For details of the pipeline and potential future expansions, please consider reading our [detailed description](docs/pipeline_steps.md).
38
-
39
-
---
40
-
41
-
# 3. Installation
42
-
**nf-core/deepmutscan** uses [Nextflow](https://nf-co.re/docs/usage/getting_started/installation), which must be installed on your system:
The pipeline itself requires no installation – Nextflow will fetch it directly from GitHub:
34
+
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
52
35
53
-
```bash
54
-
nextflow run nf-core/deepmutscan -profile docker
55
-
```
56
-
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmutscan/usage) and the [parameter documentation](https://nf-co.re/deepmutscan/parameters).
36
+
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/deepmutscan/results).
57
37
58
-
---
38
+
## Major features
59
39
60
-
# 4. Usage
61
-
Prepare:
62
-
- A **sample sheet** CSV to specify input/output labels, replicates, etc. (see [example](assets/samplesheet.csv))
63
-
- A **reference FASTA** file for the gene or region of interest
- Integration with popular statistical fitness estimation tools like [DiMSum](https://github.com/lehner-lab/DiMSum), [Enrich2](https://github.com/FowlerLab/Enrich2), [rosace](https://github.com/pimentellab/rosace/) and [mutscan](https://github.com/fmicompbio/mutscan)
43
+
- Support of multiple mutagenesis strategies, e.g. by nicking with degenerate NNK and NNS codons
44
+
- Containerisation via Docker, Singularity and Apptainer
45
+
- Scalability across HPC and Cloud systems
46
+
- Monitoring of CPU, memory, and CO₂ usage
64
47
65
-
To execute **nf-core/deepmutscan**, run the basic command:
48
+
For more details on the pipeline and on potential future expansions, please consider reading our [usage description](https://nf-co.re/deepmutscan/usage).
66
49
67
-
```bash
68
-
nextflow run nf-core/deepmutscan \
69
-
-profile singularity,local \
70
-
--input ./input.csv \
71
-
--reading_frame 1-300 \
72
-
--fasta ./ref.fa \
73
-
--mutagenesis max_diff_to_wt \
74
-
--run_seqdepth false \
75
-
--fitness true \
76
-
--outdir ./results
77
-
```
50
+
## Step-by-step pipeline summary
78
51
79
-
### Required parameters
52
+
The pipeline processes deep mutational scanning (DMS) sequencing data in several stages:
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
98
67
99
-
More options and advanced configuration: [see vignette](link). For further information or help, don't hesitate to get in touch on the [Slack `#deepmutscan` channel](https://nfcore.slack.com/channels/deepmutscan) (you can join with [this invite](https://nf-co.re/join/slack)).
68
+
First, prepare a samplesheet with your input/output data in which each row represents a pair of fastq files (paired end). This should look as follows:
The primary pipeline input is a sample sheet `.csv` file listing:
78
+
Secondly, specify the gene or gene region of interest using a reference FASTA file via `--fasta`. Provide the exact codon coordinates using `--reading_frame`.
106
79
107
-
- Paths to paired-end `.fastq.gz` files from shotgun sequencing
108
-
- Their classification as either input or output samples
109
-
- Replicate IDs
110
-
- Associated experimental metadata
80
+
Now, you can run the pipeline using:
111
81
112
-
See [sample CSV](assets/samplesheet.csv) for formatting.
82
+
```bash title="example pipeline run"
83
+
nextflow run nf-core/deepmutscan \
84
+
-profile <docker/singularity/.../institute> \
85
+
--input ./samplesheet.csv \
86
+
--fasta ./ref.fa \
87
+
--reading_frame 1-300 \
88
+
--outdir ./results
89
+
```
113
90
114
-
---
91
+
There are several optional [parameters](https://nf-co.re/deepmutscan/parameters), some of which are currently in development. For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmutscan/usage).
115
92
116
-
#6. Outputs
93
+
## Pipeline output
117
94
118
95
After execution, the pipeline creates the following directory structure:
119
96
120
-
```
97
+
```folder title="output folder structure"
121
98
results/
122
99
├── fastqc/ # Individual HTML reports for specified fastq files, raw sequencing QC
123
100
├── fitness/ # Merged variant count tables, fitness and error estimates, replicate correlations and heatmaps
@@ -129,45 +106,40 @@ results/
129
106
└── report.html # Nextflow summary report incl. detailed CPU and memory usage per for all tasks
130
107
```
131
108
132
-
---
109
+
For a full overview of the output file types, please refer to the specific [documentation](https://nf-co.re/deepmutscan/output).
133
110
134
-
# 7. Citation
135
-
136
-
If you use this pipeline in your research, please cite:
137
-
> 📄 Wehnert et al., _bioRxiv_ preprint (coming soon)
For technical challenges and feedback on the pipeline, please use our [Github repository](https://github.com/nf-core/deepmutscan). Please open an [issue](https://github.com/nf-core/deepmutscan/issues/new) or [pull request](https://github.com/nf-core/deepmutscan/compare) to:
146
116
147
-
[MIT License](link)
117
+
- Report bugs or solve data incompatibilities when running `nf-core/deepmutscan`
118
+
- Suggest the implementation of new modules for custom DMS workflows
If you are interested in getting involved as a developer, please consider joining our interactive [`#deepmutscan` Slack channel](https://nfcore.slack.com/channels/deepmutscan) (via [this invite](https://nf-co.re/join/slack)).
150
122
151
-
---
123
+
## Credits
152
124
153
-
# 9. Contributing
125
+
`nf-core/deepmutscan` was originally written by [Benjamin Wehnert](https://github.com/BenjaminWehnert1008) and [Max Stammnitz](https://github.com/MaximilianStammnitz) at the [Centre for Genomic Regulation, Barcelona](https://www.crg.eu/), with the generous support of an EMBO Long-term Postdoctoral Fellowship and a Marie Skłodowska-Curie grant by the European Union.
154
126
155
-
We welcome contributions from the community!
127
+
If you use `nf-core/deepmutscan` in your analyses, please cite:
156
128
157
-
Please open an [issue](../../issues/new) or [pull request](../../compare) via this GitHub page, to:
158
-
- Suggest or help implementing new modules for custom workflows
159
-
- Report bugs and other challenges in running **nf-core/deepmutscan**
160
-
- Help improve this documentation
129
+
> 📄 Wehnert et al., _bioRxiv_ preprint (coming soon)
161
130
162
-
You can also reach out to us via the **nf-core Slack**, by use of the `#dms` channel ([join here](https://join.slack.com/share/enQtOTMyMDc3MTA0Mzg0Mi04YmRiNDEwZTBlOTRiN2M2ZGU5ZGVmOWQ3YzA0YjA4NzhiNjFhNTVlNDA4ZTZjOTE2MjE5MmIzYWZjZTljMTE3)).
For detailled scientific or technical questions, feedback and experimental discussions, feel free to contact us directly:
138
+
For scientific discussions around the use of this pipeline (e.g. on experimental design or sequencing data requirements), please feel free to get in touch with us directly:
169
139
170
-
- Benjamin Wehnert — wehnertbenjamin@gmail.com
140
+
- Benjamin Wehnert — wehnertbenjamin@gmail.com
171
141
- Maximilian Stammnitz — maximilian.stammnitz@crg.eu
0 commit comments