Merge pull request #40 from nf-core/dev

Release v1.0.2
nf-core · Jan 14, 2021 · 838d2a5 · 838d2a5
2 parents 0e58db8 + f35b44e
commit 838d2a5
Show file tree

Hide file tree

Showing 14 changed files with 228 additions and 44 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -18,8 +18,9 @@ If you'd like to write some code for nf-core/cageseq, the standard workflow is a
 1. Check that there isn't already an issue about your idea in the [nf-core/cageseq issues](https://github.com/nf-core/cageseq/issues) to avoid duplicating work
     * If there isn't one already, please create one so that others know you're working on this
 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/cageseq repository](https://github.com/nf-core/cageseq) to your GitHub account
-3. Make the necessary changes / additions within your forked repository
-4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
+3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
+4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
+5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
 
 If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).
 
@@ -30,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t
 
 There are typically two types of tests that run:
 
-### Lint Tests
+### Lint tests
 
 `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
 To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
 
 If any failures or warnings are encountered, please follow the listed URL for more documentation.
 
-### Pipeline Tests
+### Pipeline tests
 
 Each `nf-core` pipeline should be set up with a minimal set of test-data.
 `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
@@ -55,3 +56,73 @@ These tests are run both with the latest available version of `Nextflow` and als
 ## Getting help
 
 For further information/help, please consult the [nf-core/cageseq documentation](https://nf-co.re/cageseq/usage) and don't hesitate to get in touch on the nf-core Slack [#cageseq](https://nfcore.slack.com/channels/cageseq) channel ([join our Slack here](https://nf-co.re/join/slack)).
+
+## Pipeline contribution conventions
+
+To make the nf-core/cageseq code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
+
+### Adding a new step
+
+If you wish to contribute a new step, please use the following coding standards:
+
+1. Define the corresponding input channel into your new process from the expected previous process channel
+2. Write the process block (see below).
+3. Define the output channel if needed (see below).
+4. Add any new flags/options to `nextflow.config` with a default (see below).
+5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`)
+6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
+7. Add sanity checks for all relevant parameters.
+8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
+9. Do local tests that the new code works properly and as expected.
+10. Add a new test command in `.github/workflow/ci.yaml`.
+11. If applicable add a [MultiQC](https://https://multiqc.info/) module.
+12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
+13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
+
+### Default values
+
+Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.
+
+Once there, use `nf-core schema build .` to add to `nextflow_schema.json`.
+
+### Default processes resource requirements
+
+Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
+
+The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
+
+### Naming schemes
+
+Please use the following naming schemes, to make it easy to understand what is going where.
+
+* initial process channel: `ch_output_from_<process>`
+* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`
+
+### Nextflow version bumping
+
+If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`
+
+### Software version reporting
+
+If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.
+
+Add to the script block of the process, something like the following:
+
+```bash
+<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true
+```
+
+or
+
+```bash
+<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true
+```
+
+You then need to edit the script `bin/scrape_software_versions.py` to:
+
+1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`
+2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.
+
+### Images and figures
+
+For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -13,6 +13,13 @@ Thanks for telling us about a problem with the pipeline.
 Please delete this text and anything that's not relevant from the template below:
 -->
 
+## Check Documentation
+
+I have checked the following places for your error:
+
+- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)
+- [ ] [nf-core/cageseq pipeline documentation](https://nf-co.re/nf-core/cageseq/usage)
+
 ## Description of the bug
 
 <!-- A clear and concise description of what the bug is. -->
@@ -28,6 +35,13 @@ Steps to reproduce the behaviour:
 
 <!-- A clear and concise description of what you expected to happen. -->
 
+## Log files
+
+Have you provided the following extra information/files:
+
+- [ ] The command used to run the pipeline
+- [ ] The `.nextflow.log` file <!-- this is a hidden file in the directory where you launched the pipeline -->
+
 ## System
 
 - Hardware: <!-- [e.g. HPC, Desktop, Cloud...] -->

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -13,8 +13,14 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/cage
 
 ## PR checklist
 
-- [ ] This comment contains a description of changes (with reason)
-- [ ] `CHANGELOG.md` is updated
+- [ ] This comment contains a description of changes (with reason).
 - [ ] If you've fixed a bug or added code that should be tested, add tests!
-- [ ] Documentation in `docs` is updated
-- [ ] If necessary, also make a PR on the [nf-core/cageseq branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/cageseq)
+ - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
+ - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/cageseq/tree/master/.github/CONTRIBUTING.md)
+ - [ ] If necessary, also make a PR on the nf-core/cageseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
+- [ ] Make sure your code lints (`nf-core lint .`).
+- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
+- [ ] Usage Documentation in `docs/usage.md` is updated.
+- [ ] Output Documentation in `docs/output.md` is updated.
+- [ ] `CHANGELOG.md` is updated.
+- [ ] `README.md` is updated (including new tool citations and authors/contributors).
diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml
@@ -1,5 +1,12 @@
 # Markdownlint configuration file
-default: true,
+default: true
 line-length: false
 no-duplicate-header:
     siblings_only: true
+no-inline-html:
+    allowed_elements:
+        - img
+        - p
+        - kbd
+        - details
+        - summary
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -34,13 +34,13 @@ jobs:
 
       - name: Build new docker image
         if: env.MATCHED_FILES
-        run: docker build --no-cache . -t nfcore/cageseq:1.0.1
+        run: docker build --no-cache . -t nfcore/cageseq:1.0.2
 
       - name: Pull docker image
         if: ${{ !env.MATCHED_FILES }}
         run: |
           docker pull nfcore/cageseq:dev
-          docker tag nfcore/cageseq:dev nfcore/cageseq:1.0.1
+          docker tag nfcore/cageseq:dev nfcore/cageseq:1.0.2
 
       - name: Install Nextflow
         env:
@@ -79,13 +79,13 @@ jobs:
 
       - name: Build new docker image
         if: env.MATCHED_FILES
-        run: docker build --no-cache . -t nfcore/cageseq:1.0.1
+        run: docker build --no-cache . -t nfcore/cageseq:1.0.2
 
       - name: Pull docker image
         if: ${{ !env.MATCHED_FILES }}
         run: |
           docker pull nfcore/cageseq:dev
-          docker tag nfcore/cageseq:dev nfcore/cageseq:1.0.1
+          docker tag nfcore/cageseq:dev nfcore/cageseq:1.0.2
 
       - name: Install Nextflow
         run: |

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,17 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## v1.0.2 - [2021-01-13]
+
+### `Added`
+
+* Update template to nf-core/tools `1.12.1`
+
+### `Fixed`
+
+* reads the `--input` parameters correclty
+* cleaned up multiqc config
+
 ## v1.0.1 - [2020-11-23]
 
 ### `Added`

diff --git a/Dockerfile b/Dockerfile
@@ -1,4 +1,4 @@
-FROM nfcore/base:1.12
+FROM nfcore/base:1.12.1
 LABEL authors="Kevin Menden, Tristan Kast, Matthias Hörtenhuber" \
       description="Docker image containing all software requirements for the nf-core/cageseq pipeline"
 
@@ -7,9 +7,9 @@ COPY environment.yml /
 RUN conda env create --quiet -f /environment.yml && conda clean -a
 
 # Add conda installation dir to PATH (instead of doing 'conda activate')
-ENV PATH /opt/conda/envs/nf-core-cageseq-1.0.1/bin:$PATH
+ENV PATH /opt/conda/envs/nf-core-cageseq-1.0.2/bin:$PATH
 # Dump the details of the installed packages to a file for posterity
-RUN conda env export --name nf-core-cageseq-1.0.1 > nf-core-cageseq-1.0.1.yml
+RUN conda env export --name nf-core-cageseq-1.0.2 > nf-core-cageseq-1.0.2.yml
 
 # Instruct R processes to use these empty files instead of clashing with a local version
 RUN touch .Rprofile

diff --git a/README.md b/README.md
@@ -48,6 +48,19 @@ nextflow run nf-core/cageseq -profile <docker/singularity/podman/conda/institute
 
 See [usage docs](https://nf-co.re/cageseq/usage) for all of the available options when running the pipeline.
 
+## Pipeline Summary
+
+By default, the pipeline currently performs the following:
+
+1. Input read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+2. Adapter + EcoP15 + 5'G trimming ([`cutadapt`](https://github.com/OpenGene/fastp))
+3. (optional) rRNA filtering ([`SortMeRNA`](https://github.com/biocore/sortmerna)),
+4. Trimmed and filtered read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+5. Read alignment to a reference genome ([`STAR`](https://github.com/alexdobin/STAR) or [`bowtie1`](http://bowtie-bio.sourceforge.net/index.shtml))
+6. CAGE tag counting and clustering ([`paraclu`](http://cbrc3.cbrc.jp/~martin/paraclu/))
+7. CAGE tag clustering QC ([`RSeQC`](http://rseqc.sourceforge.net/))
+8. Present QC and visualisation for raw read, alignment and clustering results ([`MultiQC`](http://multiqc.info/))
+
 ## Documentation
 
 The nf-core/cageseq pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/cageseq/usage) and [output](https://nf-co.re/cageseq/output).
@@ -62,7 +75,7 @@ If you would like to contribute to this pipeline, please see the [contributing g
 
 For further information or help, don't hesitate to get in touch on the [Slack `#cageseq` channel](https://nfcore.slack.com/channels/cageseq) (you can join with [this invite](https://nf-co.re/join/slack)).
 
-## Citation
+## Citations
 
 If you use  nf-core/cageseq for your analysis, please cite it using the following doi: [10.5281/zenodo.4095105](https://doi.org/10.5281/zenodo.4095105)
 
@@ -74,3 +87,59 @@ You can cite the `nf-core` publication as follows:
 >
 > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
 > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ)
+
+In addition, references of tools and data used in this pipeline are as follows:
+
+## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)
+
+> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
+
+## Pipeline tools
+
+* [BEDTools](https://pubmed.ncbi.nlm.nih.gov/20110278/)
+  > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
+
+* [bowtie](https://pubmed.ncbi.nlm.nih.gov/19261174/)
+  > Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4. PMID: 19261174; PMCID: PMC2690996.
+
+* [cutadapt](http://journal.embnet.org/index.php/embnetjournal/article/view/200)
+  > Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal, 17(1), pp.10-12.
+
+* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
+
+* [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
+  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
+
+* [paraclu](https://pubmed.ncbi.nlm.nih.gov/18032727/)
+  > Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, Sandelin A. A code for transcription initiation in mammalian genomes. Genome Res. 2008 Jan;18(1):1-12. doi: 10.1101/gr.6831208. Epub 2007 Nov 21. PMID: 18032727; PMCID: PMC2134772.
+
+* [RSeQC](https://pubmed.ncbi.nlm.nih.gov/22743226/)
+  > Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments Bioinformatics. 2012 Aug 15;28(16):2184-5. doi: 10.1093/bioinformatics/bts356. Epub 2012 Jun 27. PubMed PMID: 22743226.
+
+* [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/)
+  > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
+
+* [SortMeRNA](https://pubmed.ncbi.nlm.nih.gov/23071270/)
+  > Kopylova E, Noé L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data Bioinformatics. 2012 Dec 15;28(24):3211-7. doi: 10.1093/bioinformatics/bts611. Epub 2012 Oct 15. PubMed PMID: 23071270.
+
+* [STAR](https://pubmed.ncbi.nlm.nih.gov/23104886/)
+  > Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PubMed PMID: 23104886; PubMed Central PMCID: PMC3530905.
+
+* [UCSC tools](https://pubmed.ncbi.nlm.nih.gov/20639541/)
+  > Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17. PubMed PMID: 20639541; PubMed Central PMCID: PMC2922891.
+
+## Software packaging/containerisation tools
+
+* [Anaconda](https://anaconda.com)
+  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
+
+* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)
+  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
+
+* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)
+  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
+
+* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)
+
+* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
+  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.