nf-core · yuukiiwa · Mar 13, 2023 · Feb 15, 2023 · Feb 17, 2023 · Mar 3, 2023
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,34 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [3.1.0] - 2023-03-10
+
+### Major enhancements
+
+- Removed the `guppy` basecaller as distributing it via a docker image is a breach to EULA
+- Bump minimum Nextflow version from 21.10.3 -> 21.10.3
+- Update pipeline template to nf-core/tools `2.7.2`
+- Update `bambu` version from `1.0.2` to `2.0.0`
+
+### Parameters
+
+- Removed `--flowcell` as `nanoseq` no longer supports basecalling
+- Removed `--kit` as `nanoseq` no longer supports basecalling
+- Removed `--guppy_config` as `nanoseq` no longer supports basecalling
+- Removed `--guppy_model` as `nanoseq` no longer supports basecalling
+- Removed `--guppy_gpu` as `nanoseq` no longer supports basecalling
+- Removed `--guppy_gpu_runners` as `nanoseq` no longer supports basecalling
+- Removed `--guppy_cpu_threads` as `nanoseq` no longer supports basecalling
+- Removed `--output_demultiplex_fast5` as `nanoseq` no longer supports basecalling
+- Removed `--skip_basecalling` as `nanoseq` no longer supports basecalling
+- Removed `--skip_pycoqc` as `nanoseq` no longer supports basecalling
+
+### Software dependencies
+
+| Dependency           | Old version | New version |
+| -------------------- | ----------- | ----------- |
+| `bioconductor-bambu` | 2.0.0       | 3.0.8       |
+
 ## [3.0.0] - 2022-06-21
 
 ### Major enhancements

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -4,12 +4,20 @@
 
 > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
 
+## [SGNEx](https://www.biorxiv.org/content/10.1101/2021.04.21.440736v1.abstract)
+
+> Chen, Y., Davidson, N. M., Wan, Y. K., Patel, H., Yao, F., Low, H. M., ... & SG-NEx consortium. (2021). A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. BioRxiv, 2021-04.
+
 ## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/)
 
 > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
 
 ## Pipeline tools
 
+- [bambu](https://www.biorxiv.org/content/10.1101/2022.11.14.516358v2.abstract)
+
+  > Chen, Y., Sim, A. D., Wan, Y. K., Yeo, K., Lee, J. J. X., Ling, M. H., ... & Göke, J. (2022). Context-aware transcript quantification from long read RNA-seq data with Bambu. bioRxiv, 2022-11.
+
 - [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/)
 
   > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
@@ -34,9 +42,9 @@
 
   > Davidson NM, Chen Y, Sadras T, Ryland GL, Blombery P, Ekert PG, Göke J, Oshlack A. JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol. 2022 Jan 6;23(1):10. doi: 10.1186/s13059-021-02588-5. PMID: 34991664; PMCID: PMC8739696.
 
-- [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1)
+- [m6anet](https://pubmed.ncbi.nlm.nih.gov/36357692/)
 
-  > Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021)
+  > Hendra, C., Pratanwanich, P. N., Wan, Y. K., Goh, W. S., Thiery, A., & Göke, J. (2022). Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nature Methods, 1-9. PMID: 36357692; PMCID: PMC9718678.
 
 - [PEPPER-Margin-DeepVariant](https://pubmed.ncbi.nlm.nih.gov/34725481/)
 

diff --git a/README.md b/README.md
@@ -23,24 +23,25 @@ On release, automated continuous integration tests run the pipeline on a [full-s
 
 ## Pipeline Summary
 
-1. Raw read cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_)
-2. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-3. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2))
+1. Demultiplexing ([`qcat`](https://github.com/nanoporetech/qcat); _optional_)
+2. Raw read cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_)
+3. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+4. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2))
    - Both aligners are capable of performing unspliced and spliced alignment. Sensible defaults will be applied automatically based on a combination of the input data and user-specified parameters
    - Each sample can be mapped to its own reference genome if multiplexed in this way
    - Convert SAM to co-ordinate sorted BAM and obtain mapping metrics ([`samtools`](http://www.htslib.org/doc/samtools.html))
-4. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation
-5. DNA specific downstream analysis:
+5. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation
+6. DNA specific downstream analysis:
    - Short variant calling ([`medaka`](https://github.com/nanoporetech/medaka), [`deepvariant`](https://github.com/google/deepvariant), or [`pepper_margin_deepvariant`](https://github.com/kishwarshafin/pepper))
    - Structural variant calling ([`sniffles`](https://github.com/fritzsedlazeck/Sniffles) or [`cutesv`](https://github.com/tjiangHIT/cuteSV))
-6. RNA specific downstream analysis:
+7. RNA specific downstream analysis:
    - Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/))
      - bambu performs both transcript reconstruction and quantification
      - When StringTie2 is chosen, each sample can be processed individually and combined. After which, [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/) will be used for both gene and transcript quantification.
    - Differential expression analysis ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and/or [`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html))
    - RNA modification detection ([`xpore`](https://github.com/GoekeLab/xpore) and/or [`m6anet`](https://github.com/GoekeLab/m6anet))
    - RNA fusion detection ([`JAFFAL`](https://github.com/Oshlack/JAFFA))
-7. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/))
+8. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/))
 
 ### Functionality Overview
 

diff --git a/bin/run_deseq2.r b/bin/run_deseq2.r
@@ -42,17 +42,17 @@ path            <-args[2]
 
 #create a dataframe for all samples
 if (transcriptquant == "stringtie2"){
-    count.matrix       <- data.frame(read.table(path,sep="\t",header=TRUE, skip = 1))
+    count.matrix       <- data.frame(read.table(path, sep="\t", header=TRUE, skip = 1))
     count.matrix$Chr   <- count.matrix$Start <- count.matrix$End <- count.matrix$Length <- count.matrix$Strand <- NULL
-    colnames(count.matrix)[2:length(colnames(count.matrix))] <- unlist(lapply(strsplit(colnames(count.matrix)[2:length(colnames(count.matrix))],"\\."),"[[",1))
-    count.matrix       <- aggregate(count.matrix[,-1],count.matrix["Geneid"],sum)
-    countTab           <- count.matrix[,-1]
-    rownames(countTab) <-count.matrix[,1]
+    colnames(count.matrix)[2:length(colnames(count.matrix))] <- unlist(lapply(strsplit(colnames(count.matrix)[2:length(colnames(count.matrix))], "\\."), "[[", 1))
+    count.matrix       <- aggregate(count.matrix[, -1],count.matrix["Geneid"],sum)
+    countTab           <- count.matrix[, -1]
+    rownames(countTab) <-count.matrix[, 1]
 }
 if (transcriptquant == "bambu"){
-    countTab           <- data.frame(read.table(path,sep="\t",header=TRUE,row.names = 1))
-    colnames(countTab) <- unlist(lapply(strsplit(colnames(countTab),"\\."),"[[",1))
-    countTab[,1:length(colnames(countTab))] <- sapply(countTab, as.integer)
+    countTab           <- data.frame(read.table(path, sep="\t", header=TRUE, row.names = 1))
+    colnames(countTab) <- unlist(lapply(strsplit(colnames(countTab), "\\."), "[[", 1))
+    countTab[, 1:length(colnames(countTab))] <- sapply(countTab, as.integer)
 }
 
 
@@ -66,7 +66,7 @@ sample <- colnames(countTab)
 group <- sub("(^[^-]+)_.*", "\\1", sample)
 sampInfo <- data.frame(group, row.names = sample)
 if (!all(rownames(sampInfo) == colnames(countTab))){
-    sampInfo <- sampInfo[match(colnames(countTab), rownames(sampInfo)),]
+    sampInfo <- sampInfo[match(colnames(countTab), rownames(sampInfo)), ]
 }
 
 ################################################

diff --git a/conf/test.config b/conf/test.config
@@ -11,7 +11,7 @@ params {
     config_profile_name        = 'Test profile'
     config_profile_description = 'Minimal test dataset to check pipeline function'
 
-    // Limit resources so that this can run on Travis
+    // Limit resources
     max_cpus            = 2
     max_memory          = 6.GB
     max_time            = 12.h

diff --git a/conf/test_nodx_noaln.config b/conf/test_nodx_noaln.config
@@ -21,6 +21,6 @@ params {
     protocol              = 'directRNA'
     skip_demultiplexing   = true
     skip_alignment        = true
-    skip_fusion_analysis= true
+    skip_fusion_analysis  = true
     skip_modification_analysis=true
 }
diff --git a/modules/local/bambu.nf b/modules/local/bambu.nf
@@ -1,7 +1,7 @@
 process BAMBU {
     label 'process_medium'
 
-    conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.6 bioconda::bioconductor-bsgenome=1.66.0"
+    conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.8 bioconda::bioconductor-bsgenome=1.66.0"
     container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
         'https://depot.galaxyproject.org/singularity/bioconductor-bambu:3.0.8--r42hc247a5b_0' :
         'quay.io/biocontainers/bioconductor-bambu:3.0.8--r42hc247a5b_0' }"

diff --git a/modules/local/multiqc.nf b/modules/local/multiqc.nf
@@ -27,7 +27,7 @@ process MULTIQC {
 
     script:
     def args = task.ext.args ?: ''
-    def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : ''
+    def custom_config = params.multiqc_config ? "--config $ch_multiqc_custom_config" : ''
     """
     multiqc \\
         -f \\

diff --git a/nextflow.config b/nextflow.config
@@ -244,7 +244,7 @@ manifest {
     description     = """A pipeline to demultiplex, QC and map Nanopore data"""
     mainScript      = 'main.nf'
     nextflowVersion = '!>=22.10.1'
-    version         = '3.0.0'
+    version         = '3.1.0'
     doi             = ''
 }
 

diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -26,7 +26,7 @@
                 "protocol": {
                     "type": "string",
                     "description": "Input sample type. Valid options: 'DNA', 'cDNA',  and 'directRNA'.",
-                    "format": "file-path",
+                    "format": "sample-type",
                     "mimetype": "text/csv",
                     "schema": "assets/schema_input.json",
                     "help_text": "You will need to specify a protocol based on the sample input type. Valid options are 'DNA', 'cDNA', and 'directRNA'.",

diff --git a/subworkflows/local/align_graphmap2.nf b/subworkflows/local/align_graphmap2.nf
@@ -21,7 +21,7 @@ workflow ALIGN_GRAPHMAP2 {
     ch_index
         .cross(ch_fastq) { it -> it[-1] }
         .flatten()
-        .collate(12)
+        .collate(12) // [fasta, fasta sizes, gtf, bed, fasta_index, annotation_string, meta, fastq, fasta, gtf, is_transcript, fasta_gtf_string]
         .map { it -> [ it[6], it[7], it[0], it[1], it[2], it[3], it[10], it[4] ] } // [ sample, fastq, fasta, sizes, gtf, bed, is_transcripts, index ]
         .set { ch_index }